Systems and methods for creating a reorganization-immune blockchain index using mono-increasing sequence records

ABSTRACT

Systems and methods for creating a reorganization-immune blockchain index using mono-increasing sequence records are described. For example, the system may receive on-chain data for a plurality of blocks, wherein the plurality of blocks comprises a first block comprising a first event of a plurality of blockchain events within the on-chain data. The system may determine a first sequence number for the first event, wherein the first sequence number is based on a mono-increasing sequence record.

BACKGROUND

In recent years, the use of blockchains and blockchain technology hasexponentially increased. Blockchains comprise a list of records, called“blocks,” that are “chained” together using cryptography. Each block maycomprise data that is computed using a one-way function (e.g., afunction that is practically impossible to invert or reverse-compute) ofa previous block, a timestamp (e.g., indicating a creation and/ormodification time), and additional data (e.g., transactional oroperational data related to blockchain operations).

While publicity for blockchains and blockchain technology has beenconcentrated on its use for cryptocurrencies and smart contracts,blockchains and blockchain technology may be applicable to numeroustechnological avenues. A common theme of the technological avenues isthe manner in which blockchains and blockchain technology aredecentralized such that facilitation, management, and/or verification ofblockchain based operations is governed or administered not by any oneauthority but instead by a community of users. The blockchain maytherefore remain distributed (e.g., on a network of computers thatcommunicate and coordinate their actions by passing messages to oneanother), and in many cases public, through a digital ledger, whichrecords the series of blocks forming the chain. Notably, because eachblock depends on a preceding block, edits to existing blocks in thechain may not be made without affecting subsequent blocks.

Furthermore, updates to the blockchain (e.g., the addition of newblocks) may include incentivization systems that reward communitymembers for the generation of the updates while also ensuring aconsensus by the community. By doing so, the proliferation of theblockchain may proceed indefinitely.

SUMMARY

The use of blockchain technology and applications that rely onblockchain technology has grown exponentially. To use blockchain data,an application often needs to index the blockchain data. Given thedecentralized nature of the blockchain, a typical approach is to extractthe relevant data from the blockchain itself and then organize and/ordistribute the data according to the needs of the application. As thereis no common platform for indexing the data, the same process isrepeated over and over again for each new application being created oron-boarded to an existing application ecosystem.

Developing a common platform faces numerous technical hurdles. First,blockchain data is constantly changing as new chains and protocols aredeveloped. As such, any common platform would need to be compatible withthese new chains and protocols. Second, any common platform would needto be able to handle chain reorganizations. For example, while blocks ina blockchain may be immutable, what forks in the blockchain is canonicalmay change.

These problems are exacerbated by the underlying data availabilityissues of the blockchain. Specifically, archival nodes with a completestate of the blockchain are expensive to operate and data extractionfrom a node on an ad hoc basis is unreliable and slow. Conventionalapproaches for data management in other technical fields are also noteffective. For example, in a conventional distributed computing system(i.e., non-blockchain system), a system may distribute processing tasksbetween a pool of load-balanced nodes (e.g., in a master-slavearrangement) with the system maintaining continuity between the resultsof each task. However, blockchain nodes are fundamentally differentbecause the nodes act in a master-to-master arrangement with theirstates maintaining consistency with the blockchain.

In view of these technical problems, aspects are described herein forimprovements to blockchain technology, and in particular, indexingblockchain data using a bifurcated indexing system with a dynamiccompute engine.

For example, one technical problem to overcome related to indexingblockchain data is that the data sources are constantly changing (e.g.,forks may develop in existing blockchain, new protocols and blockchainsare being created, etc.). As such, any standardized indexing schema isonly able to handle current fields. If a new field is needed (e.g.,based on a new protocol, blockchain, etc.), then the entire blockchainindex must be redone.

In view of this, the system and methods provide for a unified approachthat is compatible with all blockchains, protocols, etc. To accomplishthis, the systems and methods use a bifurcated indexing system with adynamically selected application service. Specifically, as opposed toconventional indexing, the systems and methods bifurcate the indexingprocess into a storage layer and a compute layer. By doing so, thesystem may modify any processing schema (e.g., what data format is used,what compute engine is used, etc.) without affecting a storage schema.For example, the systems and methods decouple the storage system fromthe compute system, which allows the storage system to scale out (or up)as dictated by the workload. Furthermore, the system may use a storageschema that stores data as files with predefined formats and atdifferent granularity levels (e.g., in a blockchain-interface layer anda data lakehouse layer). By doing so, the systems and methods enableother layers, for example, the application service layer of the indexingapplication to choose the most appropriate data format (e.g., use a dataformat and compute engine that is best suited for the task) forprocessing the stored data. As an additional technical benefit, thesystems and methods allow for different processing layers to be used(e.g., select a specific application service layer based on a giventask) as well as multiple storage layers based on a given task (e.g., ablockchain-interface layer comprising raw blockchain data, a datalakehouse layer comprising a set of cleansed data, etc.).

In some aspects, systems and methods for improved blockchain dataindexing by decoupling compute and storage layers are described. Forexample, the system may receive, at a blockchain-interface layer, firston-chain data from a blockchain node of a blockchain network, whereinthe first on-chain data comprises hexadecimal encoded data from a firstblock of the blockchain network, wherein the blockchain-interface layertransforms the first on-chain data to a first format, using a firstcompute engine, for storage in a first dataset, and wherein the firstformat comprises data types with field names identified by a respectiveinteger. The system may receive, at a data lakehouse layer, the firston-chain data in the first format, wherein the data lakehouse layertransforms the first on-chain data to a second format, using a secondcompute engine, for storage in a second dataset, wherein the secondformat comprises a columnar oriented format, wherein the second datasetcomprises the first on-chain data and second on-chain data, and whereinthe second on-chain data is from a second block on the blockchainnetwork. The system may determine an application characteristic for anapplication that performs blockchain operations using the first on-chaindata or the second on-chain data. The system may receive, at anapplication service layer, the first on-chain data and the secondon-chain data in the second format, wherein the application servicelayer transforms, using a third compute engine, the first on-chain dataand the second on-chain data to a third format for storage in a thirddataset, and wherein the third format is dynamically selected based onthe application characteristic. The system may transmit the firston-chain data and the second on-chain data in the third format to theapplication.

In further view of the technical problems cited above, aspects aredescribed herein for improvements to blockchain technology, and inparticular, indexing blockchain data using blockchain node balancing.

For example, one technical hurdle to indexing blockchain data is how toextract data from the node efficiently. One naïve approach would bequerying from a single node, thereby eliminating the need to deal withchain reorganization or inconsistent states between the nodes. However,this approach is bottlenecked by the limited throughput of a singlenode. On the other hand, if blocks are queried from a pool ofload-balanced nodes, potentially inconsistent states between the nodeswould have to be resolved (e.g., requiring the system to introduce aconsensus algorithm to resolve the potentially inconsistent states).

In view of this, the systems and methods provide for a novel blockchainnode balancing approach using sticky master nodes. For example, thesystems and methods may first select a plurality of nodes comprisingdesignated master nodes and slave nodes. The system uses the masternodes to query the information as to what blocks are on the canonicalchains. The system then enables a sticky session while reading from themaster nodes so that the queries are served by the same node (and fallback to a different node when the previous one goes unhealthy). Toimprove the efficiency and speed of the query, the system may use batchapplication programming interfaces (APIs) to query a range of blocks,without requesting the full transaction objects. Once the blockidentifiers on the canonical chain are resolved from the master nodes,the full blocks are extracted in parallel, and/or out of order from theslave nodes, which are backed by a pool of load-balanced nodes.

In some aspects, systems and methods for improved blockchain dataindexing by avoiding throughput bottlenecks caused by reliance on asingle blockchain node are described. For example, the system maydesignate a first blockchain node of a plurality of blockchain nodes fora blockchain network as having a first node type. The system may, basedon designating the first blockchain node of the plurality of blockchainnodes as having the first node type, establish a session with the firstblockchain node. While maintaining the session, the system may determinean order of a first block and a second block on a canonical chain of theblockchain network, designate a second blockchain node and a thirdblockchain node of the plurality of blockchain nodes as having a secondnode type, based on designating the second blockchain node and the thirdblockchain node of the plurality of blockchain nodes as having thesecond node type, transmit, in parallel, queries to the secondblockchain node and the third blockchain node for first on-chain datafrom the first block and second on-chain data from the second block,respectively, and/or receive the first on-chain data or the secondon-chain data. In response to receiving the first on-chain data or thesecond on-chain data, the system may index, in a first dataset, thefirst on-chain data or the second on-chain data based on the order ofthe first block and the second block on the canonical chain.

In further view of the technical problems cited above, aspects aredescribed for improvements to blockchain technology, and in particularproviding reorganization immunity.

One technical hurdle in designing an indexing application is how tohandle blockchain reorganizations. For example, although the blocksthemselves are immutable in the blockchain, what constitutes thecanonical chain could change due to a chain reorganization.

In view of this, the systems and methods create a reorganization-immuneblockchain index using mono-increasing sequence records. For example,instead of overwriting a stored dataset (e.g., in a storage layer) whena change is seen, the system models changes as a strictly orderedsequence of added (+) or removed (−) events, with each event associatedwith a mono-increasing sequence number. Notably, such a management ofsequences is unnecessary for normal blockchain data as the data in theblocks (e.g., events) themselves are immutable and thus, there would belittle need for determining this information and appending thisinformation to a dataset of indexed blockchain data. By doing so, thesystem may implement change-data-capture patterns across the events. Forexample, the system may reconstruct a canonical chain (e.g., following areorganization) by grouping the events by height and taking the itemwith the largest sequence number from each group.

In some aspects, systems and methods for creating areorganization-immune blockchain index using mono-increasing sequencerecords are described. For example, the system may receive on-chain datafor a plurality of blocks, wherein the plurality of blocks comprises afirst block comprising a first event of a plurality of blockchain eventswithin the on-chain data. The system may determine a first sequencenumber for the first event. The system may determine a first chainheight for the first block. The system may detect a blockchain networkreorganization. In response to the blockchain network reorganization,the system may determine whether the first sequence number correspondsto a highest sequence number among respective sequence numbers for theplurality of blocks that have the first chain height, determine that thefirst block corresponds to a canonical chain for a blockchain networkbased on determining that the first sequence number corresponds to thehighest sequence number among respective sequence numbers for theplurality of blocks that have the first chain height, and/or update ablockchain index to indicate that the first block corresponds to thecanonical chain.

In further view of the technical problems cited above, aspects aredescribed herein for improvements to blockchain technology, and inparticular to improving the processing speed of raw blockchain data.

For example, even if improvements to retrieving blockchain data from ablockchain node and storing blockchain data in a reorganization-immuneblockchain index are achieved, indexing applications still face atechnical hurdle to interfacing with legacy (e.g., non-blockchain based)systems. For example, building legacy applications on top of rawblockchain datasets is a tedious process, as the raw blockchain datasetsneed to support both batch processing and streaming data applications,load and process data incrementally, and provide a near-constantlymaterialized dataset.

In view of this, the system and methods cleanse raw blockchain data intoan append-only delta table that may be accessed by legacy applications.For example, as new raw blockchain data is received, the system andmethods model the data stream as an unbounded, continuously updatedtable. By doing so, as new data is made available in the input datastream, one or more rows are appended to the unbounded table as a microbatch. From the perspective of downstream applications, the query onthis conceptual input table can be defined as if it were a static table.As such, the append-only delta table supports both batch processing andstreaming data applications, enables data to be loaded and processedincrementally, and provides a near-constantly materialized dataset.

In some aspects, systems and methods for supporting both batchprocessing and streaming data applications, to load and process dataincrementally, while providing a near-constantly materialized datasetbased on raw blockchain data, are described. For example, the system mayreceive, at a data lakehouse layer, first on-chain data in a firstformat via a first input stream, wherein the first on-chain dataoriginates from a blockchain node of a blockchain network. The systemmay transform the first on-chain data to a second format for storage ina second dataset, wherein the second format comprises an unboundedtable, and wherein transforming the first on-chain data to the secondformat comprises: detecting first new on-chain data in the first inputstream; appending the first new on-chain data to the unbounded table asa micro batch; and storing the first new on-chain data in the seconddataset. The system may generate an output based on the second dataset.

Various other aspects, features, and advantages of the invention will beapparent through the detailed description of the invention and thedrawings attached hereto. It is also to be understood that both theforegoing general description and the following detailed description areexamples and are not restrictive of the scope of the invention. As usedin the specification and in the claims, the singular forms of “a,” “an,”and “the” include plural referents unless the context clearly dictatesotherwise. In addition, as used in the specification and the claims, theterm “or” means “and/or” unless the context clearly dictates otherwise.Additionally, as used in the specification, “a portion” refers to a partof, or the entirety of (i.e., the entire portion), a given item (e.g.,data) unless the context clearly dictates otherwise.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows an illustrative diagram for a layered approach toblockchain storage and processing, in accordance with one or moreembodiments.

FIG. 2 shows an illustrative diagram for conducting blockchainoperations, in accordance with one or more embodiments.

FIG. 3 shows an illustrative diagram for a decentralized application, inaccordance with one or more embodiments.

FIG. 4 shows an illustrative diagram for conducting operations in adecentralized application using blockchain operations, in accordancewith one or more embodiments.

FIG. 5 shows an illustrative diagram for a blockchain indexer, inaccordance with one or more embodiments.

FIG. 6 shows an illustrative diagram illustrating mono-increasingsequence records, in accordance with one or more embodiments.

FIG. 7 shows an illustrative diagram for a blockchain indexer, inaccordance with one or more embodiments.

FIG. 8 shows a flowchart of the steps involved in improving blockchaindata indexing by decoupling compute and storage layers, in accordancewith one or more embodiments.

FIG. 9 shows a flowchart of the steps involved in improving blockchaindata indexing by avoiding throughput bottlenecks caused by reliance on asingle blockchain node, in accordance with one or more embodiments.

FIG. 10 shows a flowchart of the steps involved in creating areorganization-immune blockchain index using mono-increasing sequencerecords, in accordance with one or more embodiments.

FIG. 11 shows a flowchart of the steps involved in supporting both batchprocessing and streaming data applications, to load and process dataincrementally, while providing a near-constantly materialized datasetbased on raw blockchain data, in accordance with one or moreembodiments.

DETAILED DESCRIPTION OF THE DRAWINGS

In the following description, for the purposes of explanation, numerousspecific details are set forth in order to provide a thoroughunderstanding of the embodiments of the invention. It will beappreciated, however, by those having skill in the art that theembodiments of the invention may be practiced without these specificdetails or with an equivalent arrangement. In other cases, well-knownstructures and devices are shown in block diagram form in order to avoidunnecessarily obscuring the embodiments of the invention.

FIG. 1 shows an illustrative diagram for a layered approach toblockchain storage and processing, in accordance with one or moreembodiments. For example, FIG. 1 includes diagram 100 and chart 150.Diagram 100 illustrates the layers approach to blockchain storage andprocessing through the use of various datasets and/or layers. Forexample, diagram 100 includes dataset 102, dataset 104, and dataset 106.For example, diagram 100 illustrates a novel architecture, which enablesdifferent applications to access and use blockchain data in an efficientand flexible manner. For example, diagram 100 may illustrate anarchitecture of a data modeling or structuring tool of continuouslyrunning streaming applications, interconnected by datasets at differentquality levels. For example, rather than going directly from raw data(e.g., blockchain data) to finished product (e.g., curated data), thedata quality is improved incrementally until it is ready forconsumption. Furthermore, using this architecture, new data may beprocessed incrementally as it arrives, and the same programming modelmay be reused to efficiently process historical data, which improves themaintainability of the system.

Notably, the architecture, unlike that of traditional approaches,decouples the storage system from the compute system, which allows thestorage system to scale out (or up) as dictated by the workload.Furthermore, the data is saved as files with open formats at differentgranularity levels, enabling different layers to choose the mostappropriate compute engine. This flexibility—the ability to choose thestorage system, data format, as well as compute engine that are bestsuited for the workloads at hand, is a key advantage of thearchitecture.

Diagram 100 may represent a multi-layer data platform for indexingon-chain data. As shown in FIG. 1 , the multi-layer data platform maycomprise a plurality of datasets, each with their own respectivecharacteristics (e.g., as shown by chart 150). The datasets maytransition from a lower quality to a higher quality dataset (e.g.,transition from raw data to highly curated data).

For example, the multi-layer data platform may comprise a dataset 102.Dataset 102 may receive raw on-chain data (e.g., hexadecimal encodeddata) from one or more blocks of a blockchain network via a blockchainnode. Dataset 102 may be populated by the system transforming the rawon-chain data to a first format. For example, as indicated by chart 150,the first dataset may comprise a structured data structure defined inprotocol buffers (Protobuf) format. For example, Protobuf is a dataformat used to serialize structured data. Protobuf comprises aninterface description language that describes the structure of some dataand a program that generates source code from that description forgenerating or parsing a stream of bytes that represents the structureddata. For example, the first dataset may use a schema that associatesdata types with field names, using integers to identify each field. Thatis, the data may contain only the numbers, not the field names, whichgenerates bandwidth/storage savings as compared with schemas thatinclude the field names in the data.

Dataset 102 may comprise a blockchain-interface layer and may use acompute engine, wherein the compute engine comprises a first workflowarchitecture, wherein the first workflow architecture comprises a firstthreshold for workflow throughout and a first threshold for a number ofworkflows. For example, the system may select a compute engine forprocessing data in the first data dataset based on the workflowarchitecture of the compute engine. For example, the main limitation ofa workflow architecture with a low threshold for workflow throughout(e.g., a threshold rate at which events may be processed) and a highthreshold number of workflows (e.g., a threshold number of workflowsthat may simultaneously process events) is in data processing situationswith a high amount of aggregation. For example, a workflow architecturewith a low threshold for workflow throughout and a high threshold numberof workflows has a limited throughput for each workflow, but thisworkflow architecture allows for the total number of workflows to behigh. Such a workflow architecture is well suited for a dataset based onevents corresponding to individual workflows (e.g., updates for givensmart contracts, tokens, etc.). For example, a workflow architecture ofthis type may aggregate events per smart contract, token, etc., formillions of different smart contracts, tokens, etc., as the rate ofevents for each of these is low (e.g., less than 30 events per second).In contrast, such a workflow architecture may be ill suited forprocessing a dataset and/or use cases involving a high number of eventsin a low number of workflows. Additionally, the system may select asecond compute engine (e.g., for the same or another layer and/ordataset) for processing data in a dataset based on the workflowarchitecture of the second compute engine. Furthermore, as the seconddataset comprises on-chain data for a plurality of blocks, the workflowarchitecture for the second compute may require the ability to process ahigh rate of events. For example, as the second dataset processes andstores data at a different level of granularity, the second computeengine may require less individual workflows (e.g., a lower threshold ofa number of workflows) and instead a higher rate of event processing(e.g., a high threshold for workflow throughput).

Dataset 104 may comprise, at a data lakehouse layer, a dataset thatreceives first on-chain data in the first format. The data lakehouselayer may transform the first on-chain data to a second format, using asecond compute engine, for storage in a second dataset, wherein thesecond format comprises a columnar oriented format, wherein the seconddataset comprises the first on-chain data and second on-chain data, andwherein the second on-chain data is from a second block on theblockchain network. For example, while the first dataset may comprisestructured on semi-structured raw blockchain data, and thus delayerror-prone parsing and data augmentation until later, raw blockchaindata (even in a structured or semi-structured format) is difficult touse to run applications. For example, to speed up the reprocessing ofthe raw blockchain data, the system may build different batch processingpipelines; however, the underlying code cannot be reused for streamingprocessing. As such, a data lakehouse layer may comprise a differentdata structure type. A data lakehouse is a data solution concept thatcombines elements of the data warehouse with those of the data lake.Data lakehouses implement data warehouses' data structures andmanagement features for data lakes, which are typically morecost-effective for data storage.

For example, the second dataset may comprise a columnar oriented format,which is best fitted for analytic workloads. For example, the seconddataset may represent a cleansed and partitioned dataset (e.g., incontrast to the first dataset, which may comprise raw blockchain data,and the third dataset, which may be curated based on application usecases). For example, the columnar oriented format may preserve localcopies (files) of remote data on worker nodes, which may avoid remotereads during instances of a high-volume of event processing.

Dataset 106 may comprise an application service layer that receives thefirst on-chain data and the second on-chain data in the second format(or other format of another layer). The application service layer maytransform, using a third compute engine, the first on-chain data and thesecond on-chain data to a third format for storage in a third dataset,and wherein the third format is dynamically selected based on theapplication characteristic. Furthermore, the third dataset may bestructure based on application needs. Additionally, the dataset may becontinuously and incrementally updated based on information receivedfrom lower layers and/or the blockchain node, as well as informationreceived by an API layer of an application. The third dataset maytherefore be customized to meet the needs and formatting requirements ofthe API for the application. For example, the system may serve an APIlayer of the application. In such cases, the format used by theapplication service layer may be based on the API layer.

For example, the API layer of the applications can subscribe to a Kafkatopic to perform further processing. For example, asset discovery ofERC-20, ERC-721, ERC-1155, etc., can be implemented this way. As oneexample, an application service layer may be responsible for producingthe transfer events based on the token standards, and then an AssetDiscovery Service (or other layer) may pull in additional on-chain(e.g., symbol/decimals) and off-chain (e.g., token icon) metadataasynchronously. An optimization may also be done in an applicationservice layer to deduplicate the transfer events of the same addressusing time-based window aggregation. That is, the application servicelayer may use specific formats and perform specific operations based onthe needs of an application and/or the best mechanism for optimizing theapplication (and/or its interactions with other layers/applications/datasources).

FIG. 2 shows an illustrative diagram for conducting blockchainoperations, in accordance with one or more embodiments. For example, thediagram presents various components that may be used to conductblockchain operations in some embodiments. FIG. 2 includes user device202. User device 202 may include a user interface. As referred toherein, a “user interface” may comprise a mechanism for human-computerinteraction and communication in a device and may include displayscreens, keyboards, a mouse, and the appearance of a desktop. Forexample, a user interface may comprise a way a user interacts with anapplication or website in order to perform blockchain indexing, and theuser interface may display content related to blockchain data. Asreferred to herein, “content” should be understood to mean anelectronically consumable user asset, representations of goods orservices, including nonfungible tokens (NFTs), Internet content (e.g.,streaming content, downloadable content, webcasts, etc.), video data,audio data, image data, and/or textual data, etc.

As shown in FIG. 2 , system 200 may include multiple user devices (e.g.,user device 202, user device 208, and/or user device 210). For example,system 200 may comprise a distributed state machine, in which each ofthe components in FIG. 2 acts as a client of system 200. For example,system 200 (as well as other systems described herein) may comprise alarge data structure that holds not only all accounts and balances butalso a state machine, which can change from block to block according toa predefined set of rules and which can execute arbitrary machine code.The specific rules of changing state from block to block may bemaintained by a virtual machine (e.g., a computer file implemented onand/or accessible by a user device, which behaves like an actualcomputer) for the system.

It should be noted that, while shown as a smartphone, a personalcomputer, and a server in FIG. 2 , the user devices may be any type ofcomputing device, including, but not limited to, a laptop computer, atablet computer, a hand-held computer, and/or other computing equipment(e.g., a server), including “smart,” wireless, wearable, and/or mobiledevices. It should be noted that embodiments describing system 200performing a blockchain operation may equally be applied to, andcorrespond to, an individual user device (e.g., user device 202, userdevice 208, and/or user device 210) performing the blockchain operation.That is, system 200 may correspond to the user devices (e.g., userdevice 202, user device 208, and/or user device 210) collectively orindividually.

Each of the user devices may be used by the system to conduct blockchainoperations and/or contribute to indexing blockchain operations. Asreferred to herein, “blockchain operations” may comprise any operations,including and/or related to blockchains and blockchain technology. Forexample, blockchain operations may include conducting transactions,querying a distributed ledger, generating additional blocks for ablockchain, transmitting communications-related NFTs, performingencryption/decryption, exchanging public/private keys, and/or otheroperations related to blockchains and blockchain technology. In someembodiments, a blockchain operation may comprise the creation,modification, detection, and/or execution of a smart contract or programstored on a blockchain. For example, a smart contract may comprise aprogram stored on a blockchain that is executed (e.g., automatically,without any intermediary's involvement or time loss) when one or morepredetermined conditions are met. In some embodiments, a blockchainoperation may comprise the creation, modification, exchange, and/orreview of a token (e.g., a digital blockchain-specific asset), includingan NFT. An NFT may comprise a token that is associated with a good, aservice, a smart contract, and/or other content that may be verified by,and stored using, blockchain technology.

In some embodiments, blockchain operations may also comprise actionsrelated to mechanisms that facilitate other blockchain operations (e.g.,actions related to metering activities for blockchain operations on agiven blockchain network). For example, Ethereum, which is anopen-source, globally decentralized computing infrastructure thatexecutes smart contracts, uses a blockchain to synchronize and store thesystem's state changes. Ethereum uses a network-specific cryptocurrencycalled ether to meter and constrain execution resource costs. Themetering mechanism is referred to as “gas.” As the system executes asmart contract, the system accounts for every blockchain operation(e.g., computation, data access, transaction, etc.). Each blockchainoperation has a predetermined cost in units of gas (e.g., as determinedbased on a predefined set of rules for the system). When a blockchainoperation triggers the execution of a smart contract, the blockchainoperation may include an amount of gas that sets the upper limit of whatcan be consumed in running the smart contract. The system may terminateexecution of the smart contract if the amount of gas consumed bycomputation exceeds the gas available in the blockchain operation. Forexample, in Ethereum, gas comprises a mechanism for allowingTuring-complete computation while limiting the resources that any smartcontract and/or blockchain operation may consume.

In some embodiments, gas may be obtained as part of a blockchainoperation (e.g., a purchase) using a network-specific cryptocurrency(e.g., ether in the case of Ethereum). The system may require gas (orthe amount of the network-specific cryptocurrency corresponding to therequired amount of gas) to be transmitted with the blockchain operationas an earmark to the blockchain operation. In some embodiments, gas thatis earmarked for a blockchain operation may be refunded back to theoriginator of the blockchain operation if, after the computation isexecuted, an amount remains unused.

As shown in FIG. 2 , one or more user devices may include a digitalwallet (e.g., digital wallet 204) used to perform blockchain operations.For example, the digital wallet may comprise a repository that allowsusers to store, manage, and trade their cryptocurrencies and assets,interact with blockchains, and/or conduct blockchain operations usingone or more applications. The digital wallet may be specific to a givenblockchain protocol or may provide access to multiple blockchainprotocols. In some embodiments, the system may use various types ofwallets, such as hot wallets and cold wallets. Hot wallets are connectedto the Internet, while cold wallets are not. Most digital wallet holdershold both a hot wallet and a cold wallet. Hot wallets are most oftenused to perform blockchain operations, while a cold wallet is generallyused for managing a user account and may have no connection to theInternet.

As shown in FIG. 2 , one or more user devices may include acryptography-based, storage application (e.g., digital wallet 204) usedto perform blockchain operations. The cryptography-based, storageapplication may used to perform a plurality of blockchain operationsacross a computer network. The cryptography-based, storage applicationmay, in some embodiments, correspond to a digital wallet. For example,the digital wallet may comprise a repository that allows users to store,manage, and trade their cryptocurrencies and assets, interact withblockchains, and/or conduct blockchain operations using one or moreapplications. The digital wallet may be specific to a given blockchainprotocol or may provide access to multiple blockchain protocols. In someembodiments, the system may use various types of wallets such as hotwallets and cold wallets. Hot wallets are connected to the internetwhile cold wallets are not. Digital wallet holders may hold both a hotwallet and a cold wallet. Hot wallets are most often used to performblockchain operations, while a cold wallet is generally used formanaging a user account and may have no connection to the internet.

In some embodiments, the cryptography-based, storage application maycorrespond to a key-based wallet or a smart contract wallet. Forexample, a key based wallet may feature public or private keys and allowa user to either have control of the account or receive transactions inthe account. A smart contract wallet may comprise blockchain programs ordigital agreements that execute transactions between parties once apredetermined condition is met. For example, a smart contract wallet maybe managed by a smart contract (e.g., or smart contract code) instead ofa private key. As such, a smart contract wallet may improve speed,accuracy, trust, and/or transparency in blockchain operations.

As shown in FIG. 2 , one or more user devices may include a private key(e.g., key 212) and/or digital signature. For example, system 200 mayuse cryptographic systems for conducting blockchain operations thatgenerate one or more on-chain events. As referred to herein, an eventmay comprise a state, status, or change thereof related to a block,blockchain operation, and/or a blockchain network. For example, theblockchain is a chain of information that includes the transactioninformation with a timestamp (e.g., events) that cannot be altered onceit is recorded. For example, system 200 may use public key cryptography,which features a pair of digital keys (e.g., which may comprise stringsof data). In such cases, each pair comprises a public key (e.g., whichmay be public) and a private key (e.g., which may be kept private).System 200 may generate the key pairs using cryptographic algorithms(e.g., featuring one-way functions). System 200 may then encrypt amessage (or other blockchain operation) using an intended receiver'spublic key such that the encrypted message may be decrypted only withthe receiver's corresponding private key. In some embodiments, system200 may combine a message with a private key to create a digitalsignature on the message. For example, the digital signature may be usedto verify the authenticity of blockchain operations. As an illustration,when conducting blockchain operations, system 200 may use the digitalsignature to prove to every node in the system that it is authorized toconduct the blockchain operations.

For example, system 200 may comprise a plurality of nodes for theblockchain network. Each node may correspond to a user device (e.g.,user device 208). A node for a blockchain network may comprise anapplication or other software that records and/or monitors peerconnections to other nodes and/or miners for the blockchain network. Forexample, a miner comprises a node in a blockchain network thatfacilitates blockchain operations by verifying blockchain operations onthe blockchain, adding new blocks to the existing chain, and/or ensuringthat these additions are accurate. The nodes may continually record thestate of the blockchain and respond to remote procedure requests forinformation about the blockchain.

For example, user device 208 may request a blockchain operation (e.g.,conduct a transaction). The blockchain operation may be authenticated byuser device 208 and/or another node (e.g., a user device in thecommunity network of system 200). For example, using cryptographic keys,system 200 may identify users and give access to their respective useraccounts (e.g., corresponding digital wallets) within system 200. Usingprivate keys (e.g., known only to the respective users) and public keys(e.g., known to the community network), system 200 may create digitalsignatures to authenticate the users.

Following an authentication of the blockchain operation (e.g., using key212), the blockchain operation may be authorized. For example, after theblockchain operation is authenticated between the users, system 200 mayauthorize the blockchain operation prior to adding it to the blockchain.System 200 may add the blockchain operation to blockchain 206. System200 may perform this based on a consensus of the user devices withinsystem 200. For example, system 200 may rely on a majority (or othermetric) of the nodes in the community network (e.g., user device 202,user device 208, and/or user device 210) to determine that theblockchain operation is valid. In response to validation of the block, anode user device (e.g., user device 202, user device 208, and/or userdevice 210) in the community network (e.g., a miner) may receive areward (e.g., in a given cryptocurrency) as an incentive for validatingthe block.

To validate the blockchain operation, system 200 may use one or morevalidation protocols and/or validation mechanisms. For example, system200 may use a proof-of-work mechanism in which a user device mustprovide evidence that it performed computational work to validate ablockchain operation and thus this mechanism provides a manner forachieving consensus in a decentralized manner, as well as preventingfraudulent validations. For example, the proof-of-work mechanism mayinvolve iterations of a hashing algorithm. The user device that issuccessful aggregates and records blockchain operations from a mempool(e.g., a collection of all valid blockchain operations waiting to beconfirmed by the blockchain network) into the next block. Alternatively,or additionally, system 200 may use a proof-of-stake mechanism in whicha user account (e.g., corresponding to a node on the blockchain network)is required to have, or “stake,” a predetermined amount of tokens inorder for system 200 to recognize it as a validator in the blockchainnetwork.

In response to validation of the block, the block is added to blockchain206, and the blockchain operation is completed. For example, to add theblockchain operation to blockchain 206, the successful node (e.g., thesuccessful miner) encapsulates the blockchain operation in a new blockbefore transmitting the block throughout system 200.

FIG. 3 shows an illustrative diagram for a decentralized application, inaccordance with one or more embodiments. For example, in someembodiments, system 300 may index blockchain operations and/or eventsfor a decentralized application environment. A decentralized applicationmay comprise an application that exists on a blockchain (e.g.,blockchain 302) and/or a peer-to-peer network (e.g., network 306). Thatis, a decentralized application may comprise an application that has aback end that is in part powered by a decentralized peer-to-peernetwork, such as a decentralized, open-source blockchain with smartcontract functionality.

For example, network 306 may allow user devices (e.g., user device 304)within network 306 to share files and access. In particular, thepeer-to-peer architecture of network 306 allows blockchain operations(e.g., corresponding to blockchain 302) to be conducted between the userdevices in the network, without the need of any intermediaries orcentral authorities.

In some embodiments, the user devices of system 300 may comprise one ormore cloud components. For example, cloud components may be implementedas a cloud computing system and may feature one or more componentdevices. It should also be noted that system 300 is not limited to fourdevices. Users may, for instance, utilize one or more devices tointeract with one another, one or more servers, or other components ofsystem 300. It should be further noted that while one or more operations(e.g., blockchain operations) are described herein as being performed bya particular component (e.g., user device 304) of system 300, thoseoperations may, in some embodiments, be performed by other components ofsystem 300. As an example, while one or more operations are describedherein as being performed by components of user device 304, thoseoperations may, in some embodiments, be performed by one or more cloudcomponents. In some embodiments, the various computers and systemsdescribed herein may include one or more computing devices that areprogrammed to perform the described functions. Additionally, oralternatively, multiple users may interact with system 300 and/or one ormore components of system 300. For example, in one embodiment, a firstuser and a second user may interact with system 300 using two differentcomponents (e.g., user device 304 and user device 308, respectively).Additionally, or alternatively, a single user (and/or a user accountlinked to a single user) may interact with system 300 and/or one or morecomponents of system 300 using two different components (e.g., userdevice 304 and user device 308, respectively).

With respect to the components of system 300, each of these devices mayreceive content and data via input/output (I/O) paths using I/Ocircuitry. Each of these devices may also include processors and/orcontrol circuitry to send and receive commands, requests, and othersuitable data using the I/O paths. The control circuitry may compriseany suitable processing, storage, and/or I/O circuitry. Each of thesedevices may also include a user input interface and/or user outputinterface (e.g., a display) for use in receiving and displaying data.For example, as shown in FIG. 3 , both user device 308 and user device310 include a display upon which to display data (e.g., content relatedto one or more blockchain operations).

Additionally, the devices in system 300 may run an application (oranother suitable program). The application may cause the processorsand/or control circuitry to perform operations related to blockchainoperations within a decentralized application environment.

Each of these devices may also include electronic storages. Theelectronic storages may include non-transitory storage media thatelectronically stores information. The electronic storage media of theelectronic storages may include one or both of (i) system storage thatis provided integrally (e.g., is substantially non-removable) withservers or client devices, or (ii) removable storage that is removablyconnectable to the servers or client devices via, for example, a port(e.g., a USB port, a firewire port, etc.) or a drive (e.g., a diskdrive, etc.). The electronic storages may include one or more opticallyreadable storage media (e.g., optical disk, etc.), magnetically readablestorage media (e.g., magnetic tape, magnetic hard drive, floppy drive,etc.), electrical charge-based storage media (e.g., EEPROM, RAM, etc.),solid-state storage media (e.g., flash drive, etc.), and/or otherelectronically readable storage media. The electronic storages mayinclude one or more virtual storage resources (e.g., cloud storage, avirtual private network, and/or other virtual storage resources). Theelectronic storages may store software algorithms, informationdetermined by the processors, information obtained from servers,information obtained from client devices, or other information thatenables the functionality as described herein.

FIG. 3 also includes network 306, which may comprise communication pathsbetween user devices. The communication paths may include the Internet,a mobile phone network, a mobile voice or data network (e.g., a 5G orLTE network), a cable network, a public switched telephone network, orother types of communication networks or combinations of communicationnetworks. The communication paths may separately or together include oneor more communication paths, such as a satellite path, a fiber-opticpath, a cable path, a path that supports Internet communications (e.g.,IPTV), free-space connections (e.g., for broadcast or other wirelesssignals), or any other suitable wired or wireless communication path orcombination of such paths. The computing devices may include additionalcommunication paths linking a plurality of hardware, software, and/orfirmware components operating together. For example, the computingdevices may be implemented by a cloud of computing platforms operatingtogether as the computing devices.

FIG. 4 shows an illustrative diagram for conducting operations in adecentralized application using blockchain operations, in accordancewith one or more embodiments. For example, system 400 may include userdevice 402. Furthermore, user device 402 may comprise an application(e.g., application 404) that is implemented on, and/or accessible by,user device 402. For example, application 404 may interact with one ormore other applications and/or APIs in order to facilitate blockchainoperations and/or indexing. For example, application 404 may comprise adecentralized application digital wallet and/or wallet service that isable to sign and send transactions to transfer tokens and/or performother blockchain operations, as well as interact with one or moredecentralized applications.

System 400 also includes API layer 406. In some embodiments, API layer406 may be implemented on user device 402. Alternatively, oradditionally, API layer 406 may reside on one or more cloud components(e.g., server 408). For example, API layer 406 may reside on a server408 and comprise a platform service for a custodial wallet service,decentralized application, etc. API layer 406 (which may be arepresentational state transfer (REST) or web services API layer) mayprovide a decoupled interface to data and/or functionality of one ormore applications.

API layer 406 may provide various low-level and/or blockchain-specificoperations in order to facilitate blockchain operations and/or indexing.For example, API layer 406 may provide blockchain operations such asblockchain writes. Furthermore, API layer 406 may perform a transfervalidation ahead of forwarding the blockchain operation (e.g., atransaction) to another service (e.g., a crypto service). API layer 406may then log the outcome. For example, by logging to the blockchainprior to forwarding, the API layer 406 may maintain internal records andbalances without relying on external verification (e.g., which may takeup to ten minutes based on blockchain updating activity).

API layer 406 may also provide informational reads. For example, APIlayer 406 (or a platform service powered by API layer 406) may generateblockchain operation logs and write to an additional ledger (e.g., aninternal record and/or indexer service) the outcome of the reads. Ifthis is done, a user accessing the information through other means maysee consistent information such that downstream users ingest the samedata point as the user.

API layer 406 may also provide a unified API to access balances,transaction histories, and/or other blockchain operations activityrecords between one or more decentralized applications and custodialuser accounts. By doing so, the system maintains the security ofsensitive information such as the balances and transaction history.Alternatively, a mechanism for maintaining such security would separatethe API access between the decentralized applications and custodial useraccounts through the use of special logic. The introduction of thespecial logic decreases the streamlining of the system, which may resultin system errors based on divergence and reconciliation.

API layer 406 may provide a common, language-agnostic way of interactingwith an application. In some embodiments, API layer 406 may comprise aweb services API that offers a well-defined contract that describes theservices in terms of their operations and the data types used toexchange information. REST APIs do not typically have this contract;instead, they are documented with client libraries for most commonlanguages including Ruby, Java, PHP, and JavaScript. Simple ObjectAccess Protocol (SOAP) web services have traditionally been adopted inthe enterprise for publishing internal services as well as forexchanging information with partners in business-to-business (B2B)transactions.

API layer 406 may use various architectural arrangements. For example,system 400 may be partially based on API layer 406, such that there isstrong adoption of SOAP and RESTful web services, using resources suchas Service Repository and Developer Portal, but with low governance,standardization, and separation of concerns. Alternatively, system 400may be fully based on API layer 406, such that separation of concernsbetween layers, such as API layer 406, services, and applications, arein place.

In some embodiments, the system architecture may use a microserviceapproach. Such systems may use two types of layers: front-end layers andback-end layers, where microservices reside. In this kind ofarchitecture, the role of the API layer 406 may be to provideintegration between front-end and back-end layers. In such cases, APIlayer 406 may use RESTful APIs (exposition to front-end or evencommunication between microservices). API layer 406 may use the AdvancedMessage Queuing Protocol (AMQP), which is an open standard for passingbusiness messages between applications or organizations. API layer 406may use an open-source, high-performance remote procedure call (RPC)framework that may run in a decentralized application environment. Insome embodiments, the system architecture may use an open API approach.In such cases, API layer 406 may use commercial or open-source APIplatforms and their modules. API layer 406 may use a developer portal.API layer 406 may use strong security constraints applying a webapplication firewall that protects the decentralized applications and/orAPI layer 406 against common web exploits, bots, and denial-of-service(DDoS) attacks. API layer 406 may use RESTful APIs as standard forexternal integration.

As shown in FIG. 4 , system 400 may use API layer 406 to communicatewith and/or facilitate blockchain operations with server 408. Forexample, server 408 may represent a custodial platform for blockchainoperations. A custodial platform may manage private keys stored by acentralized service provider (e.g., server 408). In such cases, server408 may interact with blockchain 410, a wallet service for blockchain410, an indexer service for blockchain 410 (e.g., as described in FIG. 5), and/or other platform services.

For example, a wallet service may comprise an application and/or asoftware-based system that securely stores users' payment information,private keys, and/or passwords facilitating blockchain operations withwebsites, nodes, and/or other devices. In some embodiments, a walletservice may also provide additional ledger access (e.g., a secondledger). Furthermore, as discussed above, this second ledger may receiveupdates directly from API layer 406, as opposed to relying on datapulled directly from blockchain 410.

For example, system 400 may maintain its records (e.g., both live andfor accounting) in good order separate from balances on blockchain 410.That is, system 400 may maintain an architecture featuring the secondledger, where balances are stored and updated, and the logs ofblockchain operations. While conventional systems may rely on directlyreferencing blockchain 410, since the blockchain is the source of truthfor the system, such reliance leads to additional technical problems.

First, there is a strong likelihood of impedance mismatch between aformat for a platform service and the APIs used to retrieve data fromthe blockchain (e.g., which may lead to accounting imbalances). Forexample, system 400 may need to be able to generate accounting entriesreflecting changes of balances. However, while changes of balances canbe tracked by examining blockchain 410, this requires additionalprocessing and computational power.

Second, accounting changes in a blockchain architecture should beirreversible. This is achieved in practice for current blockchainoperations by waiting for a variable number of confirmations from theblockchain (e.g., blockchain 410). By waiting for the variable number ofconfirmations, the likelihood of an error in the blockchain becomesinfinitesimally small. However, while blockchain services rely on thismethodology, this is not a rule inherent to the blockchain itself. Thatis, the blockchain does not have an inherent authentication mechanismthat is dependent on a number of confirmations. Instead, the blockchainrelies on an absolute system—blockchain operations are either recordedon a particular node or they are not.

As such, forks in the blockchain are always possible. In the case of afork, system 400 may not follow the “right” fork for an undeterminedamount of time. If that happens, and if, for the purpose of a custodialdigital wallet, system 400 decides to move from one fork to another,system 400 may have a more straightforward mechanism to maintain anaccurate history of a user account's positions if system 400 stores themindependently from a given blockchain. Furthermore, in case of forks,system 400 performs some internal remediation on user accounts, which isenabled by system 400 maintaining a layer of insulation, from theblockchain, for remedial blockchain operations. For example, system 400may have a separate storage, protected by the second ledger (e.g., aledger service), for reads, and by a transfer service, for writes, thatreflect the state of the blockchain that is relevant for system 400purposes.

In some embodiments, the system may also use one or more applicationbinary interfaces (ABIs). An ABI is an interface between two programmodules, often between operating systems and user programs. ABIs may bespecific to a blockchain protocol. For example, an Ethereum VirtualMachine (EVM) is a core component of the Ethereum network, and a smartcontract may be a piece of code stored on the Ethereum blockchain, whichare executed on EVM. Smart contracts written in high-level languageslike Solidity or Vyper may be compiled in EVM executable bytecode by thesystem. Upon deployment of the smart contract, the bytecode is stored onthe blockchain and is associated with an address. To access functionsdefined in high-level languages, the system translates names andarguments into byte representations for byte code to work with it. Tointerpret the bytes sent in response, the system converts back to thetuple (e.g., a finite ordered list of elements) of return values definedin higher-level languages. Languages that compile for the EVM maintainstrict conventions about these conversions, but in order to performthem, the system must maintain the precise names and types associatedwith the operations. The ABI documents these names and types precisely,and in an easily parseable format, making translations betweenhuman-intended method calls and smart contract operations discoverableand reliable.

For example, ABI defines the methods and structures used to interactwith the binary contract similar to an API, but on a lower-level. TheABI indicates the caller of the function to encode (e.g., ABI encoding)the needed information like function signatures and variabledeclarations in a format that the EVM can understand to call thatfunction in bytecode. ABI encoding may be automated by the system usingcompilers or wallets interacting with the blockchain.

FIG. 5 shows an illustrative diagram for a blockchain indexer, inaccordance with one or more embodiments. For example, in someembodiments, the system may use indexer service 500 to facilitateblockchain operations and/or indexing. Indexer service 500 may fetch rawdata (e.g., data related to a current state and/or instance ofblockchain 502) from a node of a blockchain network (e.g., as describedabove). Indexer service 500 may then process the data and store it in adatabase and/or data structure in an efficient way to provide quickaccess to the data. For example, indexer 504 may publish and/or record asubset of blockchain operations that occur for blockchain 502.Accordingly, for subsequent blockchain operations, indexer service 500may reference the index at indexer 504 as opposed to a node ofblockchain 502 to provide various services at user device 506.

For example, indexer 504 may store a predetermined list of blockchainoperations to monitor for and/or record in an index. These may includeblockchain operations (e.g., “operation included,” “operation removed,”“operation finalized”) related to a given type of blockchain operation(e.g., “transaction,” “external transfer,” “internal transfer,” “newcontract metadata,” “ownership change,” etc.), as well as blockchainoperations related to a given protocol, protocol subgroup, and/or othercharacteristic (e.g., “ETH,” “ERC20,” and/or “ERC721”). Additionally,and/or alternatively, the various blockchain operations and metadatarelated to those blockchain operations (e.g., block designations, useraccounts, time stamps, etc.), as well as an aggregate of multipleblockchain operations (e.g., total blockchain operations amounts, ratesof blockchain operations, rate of blockchain updates, etc.) may bemonitored and/or recorded.

Indexer 504 may likewise provide navigation and search features (e.g.,support Boolean operations) for the indexed blockchain operations. Insome embodiments, indexer 504 may apply one or more formatting protocolsto generate representations of indexed blockchain operations in ahuman-readable format. In some embodiments, indexer 504 may also tagblockchain operations based on whether or not the blockchain operationoriginated for a local user account (e.g., a user account correspondingto a custodial account) and/or a locally hosted digital wallet. Indexerservice 500 may determine whether a blockchain operation containsrelevant information for users of indexer service 500 by storinginformation about whether an address is an internal address of indexerservice 500 or one used in a digital wallet hosted by a predeterminedwallet service.

Indexer 504 may implement one or more storage and compute layers and mayaccess data stored in one or more datasets. For example, indexer 504 mayaccess one or more blockchain nodes (e.g., node 508 or node 510) todetermine a state of one or more blockchain operations and/or smartcontracts. For example, the blockchain is as a distributed worldcomputer, where a number of distributed nodes (e.g., node 508 or node510) keep track of the same global state and agree upon what statetransitions should occur at each block. Each new block in the blockchainis based on consensus and contains the individual transactions thatdescribe the state transition from the previous block to the currentone. By replicating the state transitions, such as transactions, thestate at any given point in time can be reconstructed by replaying thestate transitions according to the rules defined by the blockchain andits associated smart contracts.

To do so, indexer 504 may identify the transactions, receipts, eventlogs, call traces, as well as the block header and uncle blocks, whichwould be sufficient to describe the state transitions for the majorityof use cases (while minimizing resources needed for storage andprocessing). For example, to calculate the address balance of the globalledger, indexer 504 selects all the transactions and internaltransactions with a non-zero value, projects them as credit/debitoperations on from/to addresses, groups the credit/debit operations byaddress, and then sums up the values. Similarly, though the states ofthe smart contracts are not extracted, their state transitions can beobserved by decoding the event logs and call traces. For example,ERC20-compliant transactions emit a transfer event log for each tokentransfer, which can be used to derive the token balance of each address.For deeper insights in smart contracts, indexer 504 can decode the calltraces, also known as internal transactions, using the ABI of the smartcontract. The internal transactions capture information aboutinteractions from one smart contract to another. This type oftransaction is widely used in the Ethereum ecosystem, where smartcontracts are used as building blocks for more complex interactions.

Indexer 504 may replicate on-chain data (e.g., the transactions,receipts, event logs, call traces, block headers, uncle blocks, and/orany other information storage on blockchain 502 and/or needed todescribe the state transitions) into a scalable storage and democratizesaccess to blockchain data. For example, this first dataset (e.g.,dataset 102 (FIG. 1 )) is sufficient as the single source of truth forany further processing needs, in contrast to the traditional ETL(“extract, transform and load”) tools for ingesting blockchain data.Indexer 504 may perform each ETL step differently than a conventionalapproach. For example, indexer 504 may extract raw data from a pool ofload-balanced nodes in parallel. A consensus algorithm is built in tohandle chain reorg (e.g., diagram 600 (FIG. 6 )). Indexer 504 mayextract the data needed from the nodes such that indexer 504 may neverneed to go back and query them again.

During the load stage, the raw block data is persisted in S3 while themeta data is stored in DynamoDB. A carefully designed key-value schema(e.g., schema 700 (FIG. 7 )) is chosen to enable out of order parallelingestion, while ensuring the observable state is strictly ordered.Indexer 504 may use chain-native and chain-agnostic parsers shipped aspart of the elopement kit (“SDK”), but not executed during dataingestion.

Indexer 504 may use batch APIs that are available to read blocks in ahorizontally scalable manner. Explicit tradeoff is made in the querypatterns to support only block-level APIs. As a result, data schema andlocality can be optimized so that the read latency is on par with theexisting indexers built on top of relational databases.

In some embodiments, indexer 504 may use streaming APIs. Streaming APIsenable downstream systems to keep pace with the blockchain state whilebeing aware of the chain reorg events. Events returned by the streamingAPIs are strictly ordered and deterministic. A mono-increasing sequencenumber is attached to each event to simplify reorg handling. (e.g.,diagram 600 (FIG. 6 )).

One of the main challenges here is how to extract data from the nodeefficiently. One naïve approach would be querying from a single node,thereby eliminating the need to deal with chain reorganization orinconsistent state between the nodes. Apparently, this approach isbottlenecked by the limited throughput of a single node. On the otherhand, if blocks are queried from a pool of load-balanced nodes, it wouldbe tricky to implement a consensus algorithm to resolve potentiallyinconsistent states between the nodes.

In view of this, indexer 504 uses master nodes to query the informationas to what blocks are on the canonical chains. Sticky sessions areenabled while reading from the master nodes (e.g., node 508) so that thequeries are served by the same node (and fall back to a different nodewhen the previous one goes unhealthy). To make this query faster,indexer 504 may generally use the batch API to query a range of blocks,without requesting the full transaction objects. Once the blockidentifiers on the canonical chain are resolved from the master nodes,the full blocks are extracted in parallel and out of order from theslave nodes, which are backed by a pool of load-balanced nodes (e.g.,node 510).

FIG. 6 shows an illustrative diagram illustrating mono-increasingsequence records, in accordance with one or more embodiments. Forexample, diagram 600 may illustrate an approach to deal with one of themain challenges in designing a dataset, specifically, how to handlechain reorganization. For example, though the blocks themselves areimmutable in the blockchain, what the canonical chain is constituted ofcould change due to chain reorgs, as illustrated in diagram 600.

For example, as shown in diagram 600, the changes to the state of theblockchain are modeled as a strictly ordered sequence of added (+) orremoved (−) events. Each event is associated with a mono-increasingsequence number, making it easier to implement the change-data-capturepattern in later steps. For example, the canonical chain can bereconstructed by grouping the events by height and taking the item withthe largest sequence number from each group. For example, the blockstream above can be replicated into a key-value store such as DynamoDBusing the time-based versioning pattern.

The system (e.g., implemented on indexer 504 (FIG. 5 )) may thengenerate records (e.g., as shown in table 650) that provide areorganization-immune blockchain index using the mono-increasingsequence records. For example, as shown in diagram 600, the system mayreceive on-chain data for a plurality of blocks, wherein the pluralityof blocks comprises a first block comprising a first event of aplurality of blockchain events within the on-chain data. The system maydetermine a sequence number and a chain height for each block asrecorded in table 650. The system may use table 650 to update theblockchain index (e.g., as stored on indexer 504). For example, thesystem may determine the block with the highest sequence number for eachblock height. The system may then determine the canonical chain for theblockchain based on the blocks that have the highest sequence number foreach group of blocks (e.g., blocks having the same chain height). Thesystem may then update a blockchain index to indicate the blocks thatcorrespond to the canonical chain.

FIG. 7 shows an illustrative diagram for a blockchain indexer, inaccordance with one or more embodiments. For example, while the layeredstorage approach mitigates the data availability problem by recordingraw blockchain data to an indexer, building business applicationsdirectly on top of a dataset featuring raw blockchain data is still atedious process. Time to reprocess the entire blockchain history isreduced from weeks to days, but it is still not good enough for rapidproduct iterations. To speed up the reprocessing, the system may have tobuild a vastly different batch processing pipeline to improve thethroughput, yet the code cannot be reused for streaming processing.

These limitations led to the development of a second dataset, which maybe built on top of data lakehouse technologies. The data lakehouse is anew paradigm that combines the best elements of data lake and datawarehouses. In the additional dataset, the dataset is partitioned at alarger granularity (e.g., many blocks per partition) and optimized forparallel workloads in Apache Spark. For example, hundreds of blocks ofdata may be stored as a single parquet file, amortizing the overhead oftask scheduling and network round trips.

The underlying storage may be in columnar format, so only the dataneeded by the query is loaded into memory. This is important from aperformance point of view, because a typical application only needs toread a small portion of this dataset. Additionally, a dataset can bewritten incrementally while downstream consumers are reading from itsimultaneously; therefore, the complex business data flow can be modeledas a continuously running streaming application, as shown in schema 700.For example, schema 700 include a continuously running data stream(e.g., comprising data 702 and data 704).

For example, table 706 may comprise an append-only delta table of thecontinuously running data stream. Table 706 may be a continuousreplication of a first dataset (e.g., dataset 102 (FIG. 1 )) with anend-to-end exactly-once-delivery guarantee. Table 706 may present acanonical view of the blockchain state and/or atransaction-canonical-view, where new transactions are inserted andorphaned transactions are soft deleted. In addition to these tables, thesystem can further decode and enrich the dataset. For example, thesystem may use the ABI of smart contracts to turn the encoded data (suchas event logs and traces) into decoded data (such as ERC-20 tokentransfers). By materializing these intermediate tables, downstream userscan work with semantic-rich data models rather than low-level details.

Table 706 may model the data stream as an unbounded, continuouslyupdated table. As new data (e.g., data 702 or data 704) is madeavailable in the input data stream, one or more rows (e.g., row 708) areappended to the unbounded table as a micro batch. From the perspectiveof downstream users, the query on this conceptual input table can bedefined as if it were a static table. For example, the system mayautomatically convert this batch-like query to a streaming executionplan through incrementalization, which determines what state needs to bemaintained to update the result each time a new micro batch arrives.

FIG. 8 shows a flowchart of the steps involved in improving blockchaindata indexing by decoupling compute and storage layers, in accordancewith one or more embodiments.

At step 802, process 800 (e.g., using one or more components describedabove) receives, at a blockchain-interface layer, first on-chain datafrom a blockchain node of a blockchain network. For example, the systemmay receive, at a blockchain-interface layer, first on-chain data from ablockchain node of a blockchain network. For example, the system may usemultiple layers (or layered programs to introduce technical efficienciesinto the indexing process). Each layer may comprise separate functionalcomponents that interact with other layers in a sequential and/orhierarchical manner. In some embodiments, each layer may interface onlywith a layer above it and the layer below it (e.g., in the programmingstack).

The first on-chain data may comprise hexadecimal encoded data from afirst block of the blockchain network. For example, the system mayreceive raw blockchain data. The raw blockchain data may comprisealphanumeric characters and/or alphanumeric text strings. In someembodiments, on-chain data may comprise data as retrieved from, oravailable on, a block of a blockchain network. For example, on-chaindata may comprise data as retrieved from a node and prior to any localprocessing to cleanse, modify, and/or organize the data. For example, inmany blockchain networks, raw blockchain data is written in ahexadecimal encoded format. For example, hexadecimal encoding is atransfer encoding in which each byte is converted to the 2-digit base-16encoding of that byte (preserving leading zeros), which is then usuallyencoded in ASCII.

The blockchain-interface layer may transform the first on-chain data toa first format, using a first compute engine, for storage in a firstdataset. For example, raw blockchain data may comprise unstructureddata. Unstructured data may be information that either does not have apredefined data model or is not organized in a predefined manner.Unstructured data may comprise alphanumeric character strings and/orhexadecimal strings. For example, unstructured data, which may becategorized as qualitative data, cannot be processed and analyzed viaconventional data tools and methods. Since unstructured data does nothave a predefined data model, the system may best manage it in anon-relational (NoSQL) database or to use one or more data lakes topreserve it in raw form.

In some embodiments, a compute engine may comprise a customizablecompute service that the system may use to create and run virtualmachines and perform tasks on a given dataset. Each compute engine maycomprise a given schema. The schema may comprise an architecture of howdata will be processed, and a database schema describes the shape of thedata and how it relates to other models, tables, and databases. Forexample, a database entry may be an instance of the database schema,containing all the properties described in the schema.

In some embodiments, the first format may comprise data types with fieldnames identified by a respective integer. For example, the first datasetmay comprise a structured data structure defined in protocol buffers(Protobuf) format. For example, Protobuf is a data format used toserialize structured data. Protobuf comprises an interface descriptionlanguage that describes the structure of some data and a program thatgenerates source code from that description for generating or parsing astream of bytes that represents the structured data. For example, thefirst dataset may use a schema that associates data types with fieldnames, using integers to identify each field. That is, the data maycontain only the numbers, not the field names, which generatesbandwidth/storage savings as compared with schemas that include thefield names in the data.

In some embodiments, transforming the first on-chain data to the firstformat may comprise receiving unstructured on-chain data, determining astructuring condition based on the first format, and applying thestructuring condition to the unstructured on-chain data to generatestructured on-chain data. For example, each new block in the blockchainis based on consensus and contains the individual transactions thatdescribe the state transition from the previous block to the currentone. By replicating the state transitions, such as transactions, thestate at any given point in time can be reconstructed by replaying thestate transitions according to the rules defined by the blockchain andits associated smart contracts. For example, in a first blockchainnetwork (e.g., Ethereum), the system may determine that thetransactions, receipts, event logs, call traces, block headers, and/oruncle blocks (or a subset thereof) are sufficient to describe the statetransitions for a given application's use cases. For example, tocalculate the address balance of the global ledger, the system mayselect all the transactions and internal transactions with a non-zerovalue, project them as credit/debit operations on from/to addresses,group the credit/debit operations by address, and then sum up thevalues. The system may receive this unstructured data and transform itto a structured format in the indexing application using a structuringcondition. For example, a structuring condition may be based on an orderof blocks in a blockchain. That is, the system may retrieve theunstructured on-chain data (e.g., comprising one or more statetransitions) and may structure this data into a series of blockchainoperations. In another example, the structuring condition may be basedon a given smart contract, user, wallet address, etc. The system maythen structure the various state transitions in the unstructuredon-chain data into a serial repository of blockchain operationsinvolving the given smart contract, user, wallet address, etc.

In some embodiments, the system may transform the first on-chain data tothe first format by receiving unstructured on-chain data, parsing theunstructured on-chain data for an unstructured on-chain datacharacteristic, and generating a semantic marker for the unstructuredon-chain data characteristic, wherein the semantic marker is stored inthe first dataset. For example, an unstructured on-chain datacharacteristic may comprise any quantitative or qualitativecharacteristic of the unstructured on-chain data that distinguishes oneportion of the unstructured on-chain data from another. For example, theunstructured on-chain data characteristic may comprise an appearance (orlack thereof) of a specific text string of alphanumeric characters, anorder (or lack thereof) of alphanumeric characters, etc. The system maytransform this to structured data.

For example, the first dataset may comprise semi-structured data.Semi-structured data may not have a predefined data model and is morecomplex than structured data, but may be easier to store thanunstructured data. Semi-structured data uses metadata (e.g., tags andsemantic markers) to identify specific data characteristics and scaledata into records and preset fields. The system may use the metadata tobetter catalog, search, and analyze the data in the first dataset thanunstructured data.

In some embodiments, receiving the first on-chain data from theblockchain node may comprise the system selecting the first block of theblockchain network, querying the first block for available data matchinga retrieval criterion, and executing a retrieval operation to retrieveany available data matching the retrieval criterion. For example, thesystem may select a given block from the blockchain network and extractall required information from the block.

In some embodiments, the system may select the block in response todetecting that information has not yet been retrieved from the block orthat specific information (e.g., relating to a specific blockchainoperation) is located in the block. For example, the system may query agiven block for all available information. By doing so, the system doesnot need to return to the block (or blockchain network) again. Whendoing so, the system may extract a subset of the available data in theblock in order to minimize processing loads and storage resources. Insuch cases, the system may retrieve only available data that matches oneor more retrieval criteria. For example, the system may retrieve rawsmart contract storage data, which is not easily available. Without thesmart contract storage data, the system may have to re-query andre-extract data from archive nodes. For example, by extracting coresmart contract storage data, the system may avoid instances where thesystem must re-query the block to extract a new state out of smartcontracts that were previously not supported.

The system may determine what data is required and/or what datacomprises smart contract storage data based on the blockchain network.Extracting smart contract storage data may comprise extractinginformation on the transaction, event logs, and/or traces, as well asthe block header and uncle blocks. For example, uncle blocks are createdwhen two blocks are mined and broadcasted at the same time (with thesame block number). Since only one of the blocks can enter the primaryEthereum chain, the block that gets validated across more nodes becomesthe canonical block, and the other one becomes what is known as an uncleblock. In some embodiments, the system may store uncle information inorder to support reorganization immunity for blockchain data sets.

Furthermore, for a smart contract specific state, if the state isemitted as part of the event logs or traces, then the system does notneed to go back and re-extract additional data from the archive nodes.For example, the system may execute a retrieval operation that parsesthe unstructured raw blockchain data for any available data that matchesone or more retrieval criteria. In some embodiments, the system may useparsing criteria specific to the retrieval operation.

To increase efficiency, in some embodiments, the system may designate afirst blockchain node of a plurality of blockchain nodes for ablockchain network as having a first node type, and based on designatingthe first blockchain node of the plurality of blockchain nodes as havingthe first node type, establish a session with the first blockchain node.For example, the system may implement process 900 (FIG. 9 ).

At step 804, process 800 (e.g., using one or more components describedabove) receives, at a data lakehouse layer, the first on-chain data inthe first format. For example, while the first dataset may comprisestructured on semi-structured raw blockchain data, and thus delayerror-prone parsing and data augmentation until later, raw blockchaindata (even in a structured or semi-structured format) is difficult touse to run applications. For example, to speed up the reprocessing ofthe raw blockchain data, the system may build different batch processingpipelines; however, the underlying code cannot be reused for streamingprocessing. As such, a data lakehouse layer may comprise a differentdata structure type.

In some embodiments, the data lakehouse layer may comprises acombination of a data lake with a data warehouse in a single dataplatform. A data lakehouse is a data solution concept that combineselements of the data warehouse with those of the data lake. Datalakehouses implement data warehouses' data structures and managementfeatures for data lakes, which are typically more cost-effective fordata storage. For example, a data lake is a centralized repository thatallows the system to store structured and unstructured data at anyscale. The system can store data as-is, without having to firststructure the data, and run different types of analytics—from dashboardsand visualizations to big data processing, real-time analytics, andmachine learning to guide better decisions. In contrast, a datawarehouse is a type of data management system that is designed to enableand support business intelligence activities, especially analytics. Datawarehouses are solely intended to perform queries and analysis and oftencontain large amounts of historical data. The data within a datawarehouse is usually derived from a wide range of sources such asapplication log files and transaction applications. Data lakehouses areuseful to data scientists as they enable machine learning and businessintelligence.

For example, the system may receive, at a data lakehouse layer, thefirst on-chain data in the first format, wherein the data lakehouselayer transforms the first on-chain data to a second format, using asecond compute engine, for storage in a second dataset. For example, thesecond dataset may be partitioned at a larger granularity (e.g., manyblocks per partition) than the first dataset. Additionally oralternatively, the workflow architecture of the compute engine for thesecond dataset may be optimized for parallel workloads with highprocessing rates.

In some embodiments, the second format comprises a columnar orientedformat, wherein the second dataset comprises the first on-chain data andsecond on-chain data, and wherein the second on-chain data is from asecond block on the blockchain network. For example, the second datasetmay be partitioned at a larger granularity (e.g., many blocks perpartition) than the first dataset. Additionally or alternatively, theworkflow architecture of the compute engine for the second dataset maybe optimized for parallel workloads with high processing rates.

In some embodiments, the first dataset may maintain the first on-chaindata as the hexadecimal encoded data while in the first format, whereinthe second dataset does not maintain the first on-chain data as thehexadecimal encoded data while in the second format. In someembodiments, the system may format raw blockchain data to be structuredor semi-structured, but may maintain the native programming/codinglanguage. For example, for both ERC20 and NFT data, the system may storethe raw (e.g., hexadecimal encoded data) event logs and traces in thefirst dataset. The system may perform this as the raw blockchain datafor these protocols does not create additional processing burdens forthe compute engine. For example, the system may determine that amodification of a workflow architecture is not required to process thisdata while serving application requests.

At step 806, process 800 (e.g., using one or more components describedabove) determines an application characteristic for an application. Forexample, the system may determine an application characteristic for anapplication that performs blockchain operations using the first on-chaindata or the second on-chain data. For example, the system may determineapplication characteristics for business-level applications. While thecompute engine may remain in some embodiments, the system may select thestorage system and/or format that best suits the application's needs.

At step 808, process 800 (e.g., using one or more components describedabove) receives, at an application service layer, the first on-chaindata and the second on-chain data in the second format. For example, thesystem may receive, at an application service layer, the first on-chaindata and the second on-chain data in the second format, wherein theapplication service layer transforms, using a third compute engine, thefirst on-chain data and the second on-chain data to a third format forstorage in a third dataset. Furthermore, the third dataset may bestructure based on application needs. Furthermore, the dataset may becontinuously and incrementally updated based on information receivedfrom lower layers and/or the blockchain node, as well as informationreceived by an API layer of an application. The third dataset maytherefore be customized to meet the needs and formatting requirements ofthe API for the application.

For example, the third format may be dynamically selected based on theapplication characteristic. For example, the API layer of theapplications can subscribe to a Kafka topic to perform furtherprocessing. For example, asset discovery of ERC-20, ERC-721, ERC-1155,etc., can be implemented this way. As one example, an applicationservice layer may be responsible for producing the transfer events basedon the token standards, and then an Asset Discovery Service (or otherlayer) may pull in additional on-chain (e.g., symbol/decimals) andoff-chain metadata (e.g., token icon) asynchronously. An optimizationmay also be done in an application service layer to deduplicate thetransfer events of the same address using time-based window aggregation.That is, the application service layer may use specific formats andperform specific operations based on the needs of an application and/orthe best mechanism for optimizing the application (and/or itsinteractions with other layers/applications/data sources).

In some embodiments, the system may select a format based on datafreshness. For example, a technical challenge in dealing with blockchaindata is how quickly a system may reprocess the entire blockchain inorder to transmit the data to an application. Depending on the datafreshness requirements for a given application the system may select aformat that is optimized for throughput as opposed to latency. Forexample, the system may determine the application characteristic bydetermining a data freshness requirement for the application andselecting the third format from a plurality of formats based on thethird format corresponding to the data freshness requirement.

Additionally, or alternatively, the system may select a dataset fromwhich an application should pull data. For example, end-to-end datafreshness is mainly constrained by the type of compute engine selection(e.g., the threshold for workflow throughput, whether the compute engineis batch-oriented, stream-oriented, or real-time oriented, and/or othercompute engine performance metrics). Accordingly, the system may selectthe compute engine based on the needs of a requesting application.Furthermore, for time-critical use cases where historical data isunnecessary, the system can communicate with the blockchain nodesdirectly. Alternatively, or additionally, the system may use a streamingAPI, which may provide better data freshness (e.g., at about 30 secondsfrom block production time). For example, the system may receive arequest, from the application, for the first on-chain data and thesecond on-chain data. The system may then select between the blockchainnode, the first dataset, the second dataset, and the third dataset forresponding to the request based on the application characteristic.

At step 810, process 800 (e.g., using one or more components describedabove) transmits the first on-chain data and the second on-chain data inthe third format to the application. For example, the system maytransmit the first on-chain data and the second on-chain data in thethird format to the application. For example, the system may serve anAPI layer of the application. In such cases, the format used by theapplication service layer may be based on the API layer.

In some embodiments, the system may use different compute engines ateach layer. For example, the first compute engine may comprise a firstworkflow architecture, wherein the first workflow architecture comprisesa first threshold for workflow throughout and a first threshold for anumber of workflows. For example, the system may select a compute enginefor processing data in the first data dataset based on the workflowarchitecture of the compute engine. For example, the main limitation ofa workflow architecture with a low threshold for workflow throughout(e.g., a threshold rate at which events may be processed) and a highthreshold number of workflows (e.g., a threshold number of workflowsthat may simultaneously process events) is in data processing situationswith a high amount of aggregation. For example, a workflow architecturewith a low threshold for workflow throughout and a high threshold numberof workflows has a limited throughput for each workflow, but thisworkflow architecture allows for the total number of workflows to behigh. Such a workflow architecture is well suited for a dataset based onevents corresponding to individual workflows (e.g., updates for givensmart contracts, tokens, etc.). For example, a workflow architecture ofthis type may aggregate events per smart contract, token, etc., formillions of different smart contracts, tokens, etc., as the rate ofevents for each of these is low (e.g., less than 30 events per second).In contrast, such a workflow architecture may be ill suited forprocessing a dataset and/or use cases involving a high number on eventsin a low number of workflows.

Additionally, or alternatively, the second compute engine and/or thirdcompute engine may comprise a second workflow architecture, wherein thesecond workflow architecture comprises a second threshold for workflowthroughout and a second threshold for the number of workflows, whereinthe second threshold for workflow throughput is higher than the firstthreshold for workflow throughput, and wherein the second threshold forthe number of workflows is lower than the first threshold for the numberof workflows. For example, the system may select a second compute enginefor processing data in the second data dataset based on the workflowarchitecture of the second compute engine. Furthermore, as the seconddataset comprises on-chain data for a plurality of blocks, the workflowarchitecture for the second compute may require the ability to process ahigh rate of events. For example, as the second dataset processes andstores data at a different level of granularity, the second computeengine may require less individual workflows (e.g., a lower threshold ofa number of workflows) and instead a higher rate of event processing(e.g., a high threshold for workflow throughput).

It is contemplated that the steps or descriptions of FIG. 8 may be usedwith any other embodiment of this disclosure. In addition, the steps anddescriptions described in relation to FIG. 8 may be done in alternativeorders or in parallel to further the purposes of this disclosure. Forexample, each of these steps may be performed in any order, in parallel,or simultaneously to reduce lag or increase the speed of the system ormethod. Furthermore, it should be noted that any of the components,devices, or equipment discussed in relation to the figures above couldbe used to perform one or more of the steps in FIG. 8 .

FIG. 9 shows a flowchart of the steps involved in improving blockchaindata indexing by avoiding throughput bottlenecks caused by reliance on asingle blockchain node, in accordance with one or more embodiments.

At step 902, process 900 (e.g., using one or more components describedabove) designates a first blockchain node as having a first node type.For example, the system may designate a first blockchain node of aplurality of blockchain nodes for a blockchain network as having a firstnode type. For example, the system may designate a first node as amaster node. For example, master/slave is a model of asymmetriccommunication or control where one device or process (the “master”)controls one or more other devices or processes (the “slaves”) andserves as their communication hub. In some systems, a master is selectedfrom a group of eligible devices, with the other devices acting in therole of slaves.

In some embodiments, the system may identify a plurality of blockchainnodes for the blockchain network. The system may then determine aplurality of blockchain node identifiers, wherein the plurality ofblockchain node identifiers comprises a respective blockchain nodeidentifier for each of the plurality of blockchain nodes. For example,each node in a blockchain network may have a unique identifier thatallows for that node to be specifically identified on the network. Theidentifier may comprise an alphanumeric character string. In someembodiments, the system may designate identifiers for blockchain node.For example, some blockchain standards (e.g., Bitcoin) do not have aunique identifier by design. For example, any property that allowssomeone on the network to verify whether two connections (even separatedin time) are to the same node may lead to a fingerprinting attack, wherethis information could be used to link transactions coming from the samenode together.

Furthermore, in some embodiments, determining the plurality ofblockchain node identifiers may comprise the system designating therespective blockchain node identifier for each of the plurality ofblockchain nodes and configuring each of the plurality of blockchainnodes to output the respective blockchain node identifier in response toa blockchain operation. For example, in some blockchain networks, nodesdo not have unique identification at the time of creation. The systemmay trigger the blockchain nodes to output an identifier (e.g., inresponse to a query to the node). The output may comprise a test stringencoded within the output that identifies the blockchain node.

At step 904, process 900 (e.g., using one or more components describedabove) establishes a session with the first blockchain node. Forexample, the system may, based on designating the first blockchain nodeof the plurality of blockchain nodes as having the first node type,establish a session with the first blockchain node. For example, thesystem may establish a sticky session while reading from the masternodes so that the queries are served by the same node. In such as case,the system may use a load balancer to create an affinity between thesystem and a specific blockchain node for the duration of a session. Forexample, establishing a sticky session offers a number of benefits thatcan improve performance, including minimizing data exchange (e.g.,servers within the system do not need to exchange session data) andbetter utilize cache (e.g., resulting in better responsiveness). Forexample, the system may use a blockchain node identifier to route allrequests to a specific blockchain node.

In some embodiments, the system may also designate a fallback node. Forexample, the system may enable a sticky session while reading from themaster node so that the queries are served by the same node (and fallback to a different node when the previous one goes unhealthy). Forexample, the system may designate a fourth blockchain node as having thefirst node type. The system may detect a failure in maintaining thesession with the first blockchain node. The system may, in response todetecting the failure in maintaining the session with the firstblockchain node, re-establish the session with the fourth blockchainnode.

At step 906, process 900 (e.g., using one or more components describedabove) determines an order of a first block and a second block. Forexample, the system may, while maintaining the session, determine anorder of a first block and a second block on a canonical chain of theblockchain network. In some embodiments, the system may retrieve a firstblockchain node identifier of the first blockchain node. The system maytransmit a first query to the first blockchain node based on the firstblockchain node identifier, wherein the first query comprises a requestto identify a plurality of blocks on the canonical chain of theblockchain network. The system may receive a first response to the firstquery, wherein the first response identifies the first block and thesecond block on the canonical chain, and wherein the first responseidentifies the order of the first block and the second block on thecanonical chain. For example, the system may first select a plurality ofnodes comprising designated master nodes and slave nodes. The systemuses the master nodes to query the information as to what blocks are onthe canonical chains.

In some embodiments, the system may utilize an ABI to call multipleblocks. For example, to improve the efficiency and speed of the query,the system may use a batch API to query a range of blocks, withoutrequesting the full transaction objects. For example, batch calls allowAPI applications to make multiple API calls within a single API call. Inaddition, each call may designate multiple blocks meaning that the batchAPI call generate less traffic and/or gas fees. In such cases, thesystem may generate a batch application programming interface call toquery a range of blocks of the canonical chain, wherein the range ofblocks comprises the first block and the second block. The system maytransmit the batch application programming interface call to the firstblockchain node.

At step 908, process 900 (e.g., using one or more components describedabove) designates a second blockchain node and a third blockchain nodeas having a second node type. For example, the system may, whilemaintaining the session, designate a second blockchain node and a thirdblockchain node of the plurality of blockchain nodes as having a secondnode type. For example, the system may designate a first node as aplurality of slave nodes.

At step 910, process 900 (e.g., using one or more components describedabove) transmits, in parallel, queries to the second blockchain node andthe third blockchain node. For example, the system may, whilemaintaining the session, based on designating the second blockchain nodeand the third blockchain node of the plurality of blockchain nodes ashaving the second node type, transmit, in parallel, queries to thesecond blockchain node and the third blockchain node for first on-chaindata from the first block and second on-chain data from the secondblock, respectively.

In some embodiments, when transmitting, in parallel, the queries to thesecond blockchain node and the third blockchain node, the system mayretrieve blockchain node identifiers. For example, the system mayretrieve a second blockchain node identifier of the second blockchainnode. The system may transmit a second query to the second blockchainnode based on the second blockchain node identifier, wherein the secondquery comprises a request for the first on-chain data from the firstblock. The system may retrieve a third blockchain node identifier of thethird blockchain node. The system may transmit a third query to thethird blockchain node based on the third blockchain node identifier,wherein the third query comprises a request for the second on-chain datafrom the second block. For example, once the block identifiers on thecanonical chain are resolved from the master nodes, the system mayextract the full blocks in parallel, and/or out of order, from the slavenodes, which may be backed by a pool of load-balanced nodes.

In some embodiments, when transmitting, in parallel, the queries to thesecond blockchain node and the third blockchain node, the system may useone or more processing metrics. The processing metrics may be based oncharacteristics of the blockchain nodes, such as costs related to eachnode, current loads on each node, security levels of each node, etc. Forexample, the system may retrieve a processing metric indicating acurrent load on the second blockchain node. The system may compare theprocessing metric to threshold metric (e.g., based on a predeterminedlevel, load on other nodes, etc.). In response to determining that theprocessing metric does not equal or exceed the threshold metric, thesystem may select to query the second blockchain node for the secondon-chain data.

At step 912, process 900 (e.g., using one or more components describedabove) receives the first on-chain data or the second on-chain data. Forexample, the system may, while maintaining the session, receive thefirst on-chain data or the second on-chain data.

In some embodiments, the system may receive the first on-chain data orthe second on-chain data, at a blockchain-interface layer, the firston-chain data, wherein the first on-chain data comprises hexadecimalencoded data from the first block of the blockchain network, wherein theblockchain-interface layer transforms, using a first compute engine, thefirst on-chain data to a first format, and wherein the first formatcomprises data types with field names identified by a respectiveinteger. For example, the system may use multiple layers (or layeredprograms to introduce technical efficiencies into the indexing process).Each layer may comprise separate functional components that interactwith other layers in a sequential and/or hierarchical manner. In someembodiments, each layer may interface only with a layer above it and thelayer below it (e.g., in the programming stack).

At step 914, process 900 (e.g., using one or more components describedabove) indexes, in a first dataset, the first on-chain data or thesecond on-chain data based on the order. For example, the system may, inresponse to receiving the first on-chain data or the second on-chaindata, index, in a first dataset, the first on-chain data or the secondon-chain data based on the order of the first block and the second blockon the canonical chain.

For example, the system may determine locations on a canonical chainbased on the blocks when indexing the first on-chain data or the secondon-chain data based on the order of the first block and the second blockon the canonical chain. For example, the system may receive a secondresponse to the second query, wherein the second response comprises thefirst on-chain data. The system may determine a first location on thecanonical chain corresponding to the first block. The system may labelthe first on-chain data as corresponding to the first location in thefirst dataset. The system may receive a third response to the thirdquery, wherein the third response comprises the second on-chain data.The system may determine a second location on the canonical chaincorresponding to the second block. The system may label the secondon-chain data as corresponding to the second location in the firstdataset.

It is contemplated that the steps or descriptions of FIG. 9 may be usedwith any other embodiment of this disclosure. In addition, the steps anddescriptions described in relation to FIG. 9 may be done in alternativeorders or in parallel to further the purposes of this disclosure. Forexample, each of these steps may be performed in any order, in parallel,or simultaneously to reduce lag or increase the speed of the system ormethod. Furthermore, it should be noted that any of the components,devices, or equipment discussed in relation to the figures above couldbe used to perform one or more of the steps in FIG. 9 .

FIG. 10 shows a flowchart of the steps involved in creating areorganization-immune blockchain index using mono-increasing sequencerecords, in accordance with one or more embodiments.

At step 1002, process 1000 (e.g., using one or more components describedabove) receives on-chain data comprising a block and an event. Forexample, the system may receive on-chain data for a plurality of blocks,wherein the plurality of blocks comprises a first block comprising afirst event of a plurality of blockchain events within the on-chaindata. Additionally, or alternatively, the plurality of blocks mayfurther comprise a second block comprising a second event of theplurality of blockchain events within the on-chain data. For example,the first event and/or the second event may comprise on-chain events(e.g., transactions) and/or blockchain operations.

At step 1004, process 1000 (e.g., using one or more components describedabove) determines a sequence number for the event. For example, thesystem may determine a first sequence number for the first event.Additionally, or alternatively, the system may determine a secondsequence number for the second event. For example, instead ofoverwriting data in the dataset when a change is detected, the systemmay model the changes as a strictly ordered sequence of added (+) orremoved (−) events. As such, each event may be associated with amono-increasing sequence number, making it easier to implement thechange-data-capture pattern in later steps. For example,change-data-capture is a software process that identifies and trackschanges to data in a database. Change-data-capture provides real-time ornear-real-time movement of data by moving and processing datacontinuously as new database events occur. Notably, such processes arenot conventionally available to blockchain data. As such, the system mayperform a data integration process in which data is extracted fromvarious sources (e.g., various blocks in one or more blockchains) anddelivered to a data lakehouse, data warehouse, database, and/or datalake. By doing so, the system may receive the benefits ofchange-data-capture processes. For example, in high-velocity dataenvironments where time-sensitive decisions are made,change-data-capture allows the system to achieve low-latency, reliable,and scalable data replication, as well as zero-downtime migrations tocloud resources. In the present case, this also allows the system torapidly update the index during reorganizations. As this can be done inreal-time, the index becomes reorganization-immune.

The system may assign the sequence number based on numerous methods. Forexample, the system may assign the first sequence number to the firstevent based on an order in which the first event was received by anindexing application. For example, the sequence in which the event wasreceived by the indexing application may differ from a sequence in whichthe event happened. For example, the system may receive events out oforder as data is extracted from different blocks from a plurality ofslave nodes. Additionally, or alternatively, the system may assign thefirst sequence number to the first event based on an order of the firstblock and a second block on the canonical chain of the blockchainnetwork. For example, the system may determine the sequence number basedon an order in a canonical chain of a blockchain network. In someembodiments, the system may receive this order from a blockchain nodeprocessing pool. The system may use a version of master/slaveprocessing. For example, once the block identifiers on the canonicalchain are resolved from the master nodes, the system may extract thefull blocks in parallel, and/or out of order, from the slave nodes,which may be backed by a pool of load-balanced nodes.

At step 1006, process 1000 (e.g., using one or more components describedabove) determines a chain height for the block. For example, the systemmay determine a first chain height for the first block. Additionally, oralternatively, the system may determine a second chain height for thesecond block. For example, the block height chain height of a particularblock is defined as the number of blocks preceding it in the blockchain.In some embodiments, the chain height can either reference the locationof a transaction that has been completed in the past's location in theblockchain, or refer to the present length, block location within achain, and/or size of a blockchain.

At step 1008, process 1000 (e.g., using one or more components describedabove) detects a blockchain network reorganization. For example, thesystem may detect a blockchain network reorganization. For example, thesystem may receive a system update based on manual user input indicatingthat a blockchain network has undergone a reorganization event.Alternatively, or additionally, the system may detect a blockchainreorganization based on detecting a fork in the blockchain network. Forexample, a chain reorganization (or “reorg”) takes place when a nodereceives blocks that are part of a new longest chain. The node will thendeactivate blocks in its old longest chain in favor of the blocks thatbuild the new longest chain.

The system may detect the blockchain network reorganization usingnumerous methods. For example, a chain reorganization may occur aftertwo blocks have been mined at the same time. Due to the propagationspeed of blocks across the blockchain network, some nodes will receivethe one block first, and some nodes will receive the other block first.Therefore, there will be a disagreement about which of these blocks wasactually “first” and belongs at the top of the blockchain. The nextblock to be mined will build on top of one of these blocks, creating anew longest chain. When nodes receive this newest block, the nodes willsee that it creates a new longest chain, and will each perform a chainreorganization to adopt it. Transactions inside blocks that aredeactivated due to a chain reorganization (also known as “orphanblocks”) are no longer part of the transaction history of theblockchain. In such cases, the system may receive a first notificationfrom a first blockchain node identifying a last minted block for theblockchain network. The system may then determine that a previouslyminted block corresponds to an orphan chain of the blockchain network.

Additionally, or alternatively, a chain reorganization may occur basedon detecting a soft fork or a hard fork. In blockchain technology, asoft fork is a change to the software protocol where only previouslyvalid transaction blocks are made invalid. Because old nodes willrecognize the new blocks as valid, a soft fork is backwards-compatible.This kind of fork requires only a majority of the miners upgrading toenforce the new rules. In contrast, a hard fork is a radical change to anetwork's protocol that makes previously invalid blocks and transactionsvalid, or vice-versa. A hard fork requires all nodes or users to upgradeto the latest version of the protocol software. In such cases, thesystem may receive a second notification indicating enforcement of a newrule by a subset of miners on the blockchain network. The system maydetermine that the subset is a majority of miners on the blockchainnetwork.

Additionally, or alternatively, the system may follow a master node(either backed by a single node or a cluster of nodes with stickysession enabled). For example, in a conventional system, the indexingapplication may always follow the longest chain at any given point intime. In contrast, when the state of a master node diverges from theinternal state, the system detects a fork and then updates the internalstate to match the node's state. In such a case, the system maydesignate a first blockchain node of a plurality of blockchain nodes forthe blockchain network as having a first node type (e.g., as describedabove). The system may then receive a third notification, from the firstblockchain node, identifying a new canonical chain.

At step 1010, process 1000 (e.g., using one or more components describedabove) groups the block by the chain height. For example, the systemmay, in response to the blockchain network reorganization, determinewhether the first sequence number corresponds to a highest sequencenumber among respective sequence numbers for the plurality of blocksthat have the first chain height. Additionally, or alternatively, thesystem may determine whether the second sequence number corresponds to ahighest sequence number among respective sequence numbers for theplurality of blocks that have the second chain height.

To perform the grouping and/or determine whether the first sequencenumber corresponds to the highest sequence number among respectivesequence numbers for the plurality of blocks that have the first chainheight, the system may retrieve respective sequence numbers for theplurality of blocks that have the first chain height. The system mayrank the respective sequence numbers based on value. The system maydetermine that the first sequence number corresponds to the highestsequence number based on the ranking.

At step 1012, process 1000 (e.g., using one or more components describedabove) determines the block corresponds to the canonical chain based onthe sequence number within the grouping. For example, the system may, inresponse to the blockchain network reorganization, determine that thefirst block corresponds to a canonical chain for a blockchain networkbased on determining that the first sequence number corresponds to thehighest sequence number among respective sequence numbers for theplurality of blocks that have the first chain height. Additionally, oralternatively, the system may determine that the second blockcorresponds to the canonical chain for the blockchain network based ondetermining that the second sequence number corresponds to the highestsequence number among respective sequence numbers for the plurality ofblocks that have the second chain height.

At step 1014, process 1000 (e.g., using one or more components describedabove) updates the blockchain index. For example, the system may updatea blockchain index to indicate that the first block corresponds to thecanonical chain. Additionally, or alternatively, the system may updatethe blockchain index to indicate that the second block corresponds tothe canonical chain.

In contrast, the system may designate a block as being orphaned. Forexample, the system may determine that the second block does notcorrespond to the canonical chain for the blockchain network based ondetermining that the second sequence number does not correspond to thehighest sequence number among respective sequence numbers for theplurality of blocks that have the second chain height. In response todetermining that the second block does not correspond to the canonicalchain for the blockchain network, the system may update the blockchainindex to indicate that the second block corresponds to an orphan chain.For example, a canonical chain may be the chain which is agreed to bethe “main” chain by a consensus protocol. Blocks are “orphaned” whenthey are in one of the “side” chains.

It is contemplated that the steps or descriptions of FIG. 10 may be usedwith any other embodiment of this disclosure. In addition, the steps anddescriptions described in relation to FIG. 10 may be done in alternativeorders or in parallel to further the purposes of this disclosure. Forexample, each of these steps may be performed in any order, in parallel,or simultaneously to reduce lag or increase the speed of the system ormethod. Furthermore, it should be noted that any of the components,devices, or equipment discussed in relation to the figures above couldbe used to perform one or more of the steps in FIG. 10 .

FIG. 11 shows a flowchart of the steps involved in supporting both batchprocessing and streaming data applications, to load and process dataincrementally, while providing a near-constantly materialized datasetbased on raw blockchain data, in accordance with one or moreembodiments.

At step 1102, process 1100 (e.g., using one or more components describedabove) receives first on-chain data in a first format. For example, thesystem may receive, at a data lakehouse layer, first on-chain data in afirst format via a first input stream, wherein the first on-chain dataoriginates from a blockchain node of a blockchain network. For example,while the first dataset may comprise structured on semi-structured rawblockchain data, and thus delay error-prone parsing and dataaugmentation until later, raw blockchain data (even in a structured orsemi-structured format) is difficult to use to run applications. Forexample, to speed up the reprocessing of the raw blockchain data, thesystem may build different batch processing pipelines; however, theunderlying code cannot be reused for streaming processing. As such, thedata lakehouse layer may comprise a different data structure type. Adata lakehouse is a data solution concept that combines elements of thedata warehouse with those of the data lake. Data lakehouses implementdata warehouses' data structures and management features for data lakes,which are typically more cost-effective for data storage.

In some embodiments, the first on-chain data may first be processedthrough one or more layers. For example, the system may receive, at ablockchain-interface layer, the first on-chain data from the blockchainnode of the blockchain network, wherein the first on-chain datacomprises hexadecimal encoded data from a first block of the blockchainnetwork. The system may then transform the first on-chain data to thefirst format, using a first compute engine, for storage in a firstdataset, wherein the first format comprises data types with field namesidentified by a respective integer, wherein the first compute enginecomprises a first workflow architecture, and wherein the first workflowarchitecture comprises a first threshold for workflow throughout and afirst threshold for a number of workflows.

At step 1104, process 1100 (e.g., using one or more components describedabove) transforms the first on-chain data to a second format for storagein a second dataset. For example, the system may transform the firston-chain data to a second format for storage in a second dataset. Forexample, the second dataset may comprise a columnar oriented format,which is best fitted for analytic workloads. For example, the seconddataset may represent a cleansed and partitioned dataset (e.g., incontrast to the first dataset, which may comprise raw blockchain data,and the third dataset, which may be curated based on application usecases). For example, the columnar oriented format may preserve localcopies (files) of remote data on worker nodes, which may avoid remotereads during instances of a high-volume of event processing.

The second format may comprise an unbounded table. For example, using anunbounded table allows for new data to be quickly integrated into theexisting dataset. For example, new data arriving as an unbounded inputtable, wherein every new item in the data stream is treated as a newcolumn (or row) in the table.

Additionally or alternatively, the second format may comprise a columnaroriented format. For example, appending the first new on-chain data tothe unbounded table as the micro batch may comprise adding a new columnto the unbounded table. Instead of keeping a record of every column in atable in a single row, a column-oriented database, and in particular, anunbounded table, may store the data for each column in a single column.The main benefit of a columnar database is faster performance comparedto a row-oriented one because it accesses less memory to output data.For example, by doing so, the system may treat all the data arriving(e.g., in an input stream) as an unbounded input table, wherein everynew item in the data stream is treated as a new column (or row) in thetable. By using the columnar format, only the data needed by a query isloaded into memory. Limiting the amount of data loaded into memory isimportant from a performance point of view because a typical applicationonly needs to read a small portion of the second dataset (e.g., abalance indexer only cares about the monetary activities). Thus, the useof the columnar format provides performance benefits.

Transforming the first on-chain data to the second format may comprisethe system performing numerous steps. For example, the system may detectfirst new on-chain data in the first input stream. The system may thenappend the first new on-chain data to the unbounded table as a microbatch. For example, the system may use micro batches to improveperformance speed and provide a near-constantly materialized dataset.Micro batch processing is the practice of collecting data in smallgroups (“batches”) for the purposes of taking action on (“processing”)that data. In contrast, conventional system may use “batch processing,”which involves taking action on a large group of data. Micro batchprocessing is a variant of traditional batch processing in that the dataprocessing occurs more frequently so that smaller groups of new data areprocessed. The system may then store the first new on-chain data in thesecond dataset.

Additionally, or alternatively, transforming the first on-chain data tothe second format may comprise modifying existing on-chain data in theunbounded table. For example, the system may detect second new on-chaindata in the first input stream. The system may modify existing on-chaindata in the unbounded table based on the second new on-chain data. Thesystem may store the second new on-chain data in the second dataset. Forexample, while the system may append a delta table upon detecting newdata, the use of the data lakehouse, in particular the features of thedata lake, supports upsert and change data feed operations. Theseoperations may be used to updated transaction-canonical-view tablesand/or indicate canonical chains.

Additionally, or alternatively, transforming the first on-chain data tothe second format may comprise modifying existing on-chain data in theunbounded table using specific functions. For example, the system maydetect third new on-chain data in the first input stream. The system maymodify existing on-chain data in the unbounded table based on the thirdnew on-chain data using a single call to insert or update the existingon-chain data in the unbounded table. The system may store the third newon-chain data in the second dataset. For example, while the system mayappend the delta table upon detecting new data, the use of the datalakehouse, in particular the features of the data lake, supports upsertand change data feed operations. These operations may be used to updatetransaction-canonical-view tables and/or indicate canonical chains. Forexample, using the upsert operation, the system can either insert orupdate an existing record in one call. To determine whether a recordalready exists, the upsert statement and/or the system uses the record'sidentifier as the key to match records, a custom external identifierfield, or a standard field (e.g., with an idLookup attribute set totrue).

Additionally or alternatively, transforming the first on-chain data tothe second format may comprise modifying existing on-chain data based ondetecting a blockchain network reorganization. For example, the systemmay detect a blockchain network reorganization. For example, the systemmay receive a system update based on manual user input indicating that ablockchain network has undergone a reorganization event. Alternatively,or additionally, the system may detect a blockchain reorganization basedon detecting a fork in the blockchain network. For example, a chainreorg takes place when a node receives blocks that are part of a newlongest chain. The node will then deactivate blocks in its old longestchain in favor of the blocks that build the new longest chain. Thesystem may then modify existing on-chain data in the unbounded tablebased on the blockchain network reorganization. For example, the systemmay provide a canonical view of the blockchain state, where newtransactions are inserted and orphaned transactions are soft deleted. Assuch, the system may detect a reorganization event and update data inthe second dataset to reflect that one or more transactions areorphaned.

Additionally, or alternatively, transforming the first on-chain data tothe second format may comprise modifying existing on-chain data based ondetecting a blockchain network reorganization. For example, the systemmay provide a canonical view of the blockchain state, where newtransactions are inserted and orphaned transactions are soft deleted. Assuch, the system may detect a reorganization event and update data inthe second dataset to reflect that one or more transactions areorphaned.

At step 1106, process 1100 (e.g., using one or more components describedabove) generates an output based on the second dataset. For example, thesystem may output delta tables based on data received (e.g., in an inputstream) from the first dataset. The output may comprise adds and deletestables for the second dataset, which may comprise a delta table becauseit stores changes made to the second dataset.

For example, the system may provide numerous advantages through the useof the second format and/or specific outputs. For example, the systemmay receive, via an application service layer, a static table query. Thesystem may process the static table query using the second dataset. Forexample, static tables are the master tables that are populated withcanned data at the time of creation of the database in a typical systemsetup. For example, static data refers to a fixed data set—or, data thatremains the same after it is collected. Dynamic data, on the other hand,continually changes after it is recorded in order to maintain itsintegrity. However, as the system has reformatted the dynamic blockchaindata, the system may receive queries on this conceptual input table thatcan be defined as if it were a static table. For example, the system mayautomatically convert this batch-like query to a streaming executionplan (e.g., via incrementalization). That is, the system determines whatstate needs to be maintained to update the result each time a new microbatch arrives. As such, the system allows for better integration ofblockchain data with non-blockchain systems.

It is contemplated that the steps or descriptions of FIG. 11 may be usedwith any other embodiment of this disclosure. In addition, the steps anddescriptions described in relation to FIG. 11 may be done in alternativeorders or in parallel to further the purposes of this disclosure. Forexample, each of these steps may be performed in any order, in parallel,or simultaneously to reduce lag or increase the speed of the system ormethod. Furthermore, it should be noted that any of the components,devices, or equipment discussed in relation to the figures above couldbe used to perform one or more of the steps in FIG. 11 .

The above-described embodiments of the present disclosure are presentedfor purposes of illustration and not of limitation, and the presentdisclosure is limited only by the claims which follow. Furthermore, itshould be noted that the features and limitations described in any oneembodiment may be applied to any embodiment herein, and flowcharts orexamples relating to one embodiment may be combined with any otherembodiment in a suitable manner, done in different orders, or done inparallel. In addition, the systems and methods described herein may beperformed in real-time. It should also be noted that the systems and/ormethods described above may be applied to, or used in accordance with,other systems and/or methods.

The present techniques will be better understood with reference to thefollowing enumerated embodiments:

-   -   1. A method, the method comprising: receiving, at a        blockchain-interface layer, first on-chain data from a        blockchain node of a blockchain network, wherein the first        on-chain data comprises hexadecimal encoded data from a first        block of the blockchain network, wherein the        blockchain-interface layer transforms the first on-chain data to        a first format, using a first compute engine, for storage in a        first dataset, and wherein the first format comprises data types        with field names identified by a respective integer; receiving,        at a data lakehouse layer, the first on-chain data in the first        format, wherein the data lakehouse layer transforms the first        on-chain data to a second format, using a second compute engine,        for storage in a second dataset, wherein the second format        comprises a columnar oriented format, wherein the second dataset        comprises the first on-chain data and second on-chain data, and        wherein the second on-chain data is from a second block on the        blockchain network; determining an application characteristic        for an application that performs blockchain operations using the        first on-chain data or the second on-chain data; receiving, at        an application service layer, the first on-chain data and the        second on-chain data in the second format, wherein the        application service layer transforms, using a third compute        engine, the first on-chain data and the second on-chain data to        a third format for storage in a third dataset, and wherein the        third format is dynamically selected based on the application        characteristic; and transmitting the first on-chain data and the        second on-chain data in the third format to the application.    -   2. The method of the preceding embodiment, wherein the method is        for improved blockchain data indexing by decoupling compute and        storage layers.    -   3. The method of any one of the preceding embodiments, wherein        determining the application characteristic further comprises:        determining a data freshness requirement for the application;        and selecting the third format from a plurality of formats based        on the third format corresponding to the data freshness        requirement.    -   4. The method of any one of the preceding embodiments, further        comprising: receiving a request, from the application, for the        first on-chain data and the second on-chain data; and selecting        between the blockchain node, the first dataset, the second        dataset, and the third dataset for responding to the request        based on the application characteristic.    -   5. The method of any one of the preceding embodiments, wherein        the data lakehouse layer comprises a combination of a data lake        with a data warehouse in a single data platform.    -   6. The method of any one of the preceding embodiments, wherein        transforming the first on-chain data to the first format        comprises: receiving unstructured on-chain data; determining a        structuring condition based on the first format; and applying        the structuring condition to the unstructured on-chain data to        generate structured on-chain data.    -   7. The method of any one of the preceding embodiments, wherein        transforming the first on-chain data to the first format        comprises: receiving unstructured on-chain data; parsing the        unstructured on-chain data for an unstructured on-chain data        characteristic; and generating a semantic marker for the        unstructured on-chain data characteristic, wherein the semantic        marker is stored in the first dataset.    -   8. The method of any one of the preceding embodiments, wherein        receiving the first on-chain data from the blockchain node        comprises: selecting the first block of the blockchain network;        querying the first block for available data matching a retrieval        criterion; and executing a retrieval operation to retrieve any        available data matching the retrieval criterion.    -   9. The method of any one of the preceding embodiments, wherein:        the first compute engine comprises a first workflow        architecture, wherein the first workflow architecture comprises        a first threshold for workflow throughout and a first threshold        for a number of workflows; wherein the second compute engine        comprises a second workflow architecture, wherein the second        workflow architecture comprises a second threshold for workflow        throughout and a second threshold for the number of workflows,        wherein the second threshold for workflow throughput is higher        than the first threshold for workflow throughput, and wherein        the second threshold for the number of workflows is lower than        the first threshold for the number of workflows; and wherein the        third compute engine comprises the second workflow architecture.    -   10. The method of any one of the preceding embodiments, wherein        the first dataset maintains the first on-chain data as the        hexadecimal encoded data while in the first format, and wherein        the second dataset does not maintain the first on-chain data as        the hexadecimal encoded data while in the second format.    -   11. The method of any one of the preceding embodiments, further        comprising: designating a first blockchain node of a plurality        of blockchain nodes for a blockchain network as having a first        node type; and based on designating the first blockchain node of        the plurality of blockchain nodes as having the first node type,        establishing a session with the first blockchain node.    -   12. A method, the method comprising: designating a first        blockchain node of a plurality of blockchain nodes for a        blockchain network as having a first node type; based on        designating the first blockchain node of the plurality of        blockchain nodes as having the first node type, establishing a        session with the first blockchain node; while maintaining the        session: determining an order of a first block and a second        block on a canonical chain of the blockchain network;        designating a second blockchain node and a third blockchain node        of the plurality of blockchain nodes as having a second node        type; based on designating the second blockchain node and the        third blockchain node of the plurality of blockchain nodes as        having the second node type, transmitting, in parallel, queries        to the second blockchain node and the third blockchain node for        first on-chain data from the first block and second on-chain        data from the second block, respectively; receiving the first        on-chain data or the second on-chain data; and in response to        receiving the first on-chain data or the second on-chain data,        indexing, in a first dataset, the first on-chain data or the        second on-chain data based on the order of the first block and        the second block on the canonical chain.    -   13. The method of any one of the preceding embodiments, wherein        the method is for improved blockchain data indexing by        decoupling compute and storage layers.    -   14. The method of any of the preceding embodiments, wherein        indexing the first on-chain data or the second on-chain data        based on the order of the first block and the second block on        the canonical chain further comprises: receiving a second        response to the second query, wherein the second response        comprises the first on-chain data; determining a first location        on the canonical chain corresponding to the first block;        labeling the first on-chain data as corresponding to the first        location in the first dataset; receiving a third response to the        third query, wherein the third response comprises the second        on-chain data; determining a second location on the canonical        chain corresponding to the second block; and labeling the second        on-chain data as corresponding to the second location in the        first dataset.    -   15. The method of any one of the preceding embodiments, wherein        transmitting, in parallel, the queries to the second blockchain        node and the third blockchain node further comprises: retrieving        a second blockchain node identifier of the second blockchain        node; transmitting a second query to the second blockchain node        based on the second blockchain node identifier, wherein the        second query comprises a request for the first on-chain data        from the first block; retrieving a third blockchain node        identifier of the third blockchain node; and transmitting a        third query to the third blockchain node based on the third        blockchain node identifier, wherein the third query comprises a        request for the second on-chain data from the second block.    -   16. The method of any one of the preceding embodiments, wherein        transmitting, in parallel, the queries to the second blockchain        node and the third blockchain node further comprises: retrieving        a processing metric indicating a current load on the second        blockchain node; comparing the processing metric to threshold        metric; and in response to determining that the processing        metric does not equal or exceed the threshold metric, selecting        to query the second blockchain node for the second on-chain        data.    -   17. The method of any one of the preceding embodiments, wherein        determining the order of the first block and the second block on        the canonical chain of the blockchain network further comprises:        retrieving a first blockchain node identifier of the first        blockchain node; transmitting a first query to the first        blockchain node based on the first blockchain node identifier,        wherein the first query comprises a request to identify a        plurality of blocks on the canonical chain of the blockchain        network; and receiving a first response to the first query,        wherein the first response identifies the first block and the        second block on the canonical chain, and wherein the first        response identifies the order of the first block and the second        block on the canonical chain.    -   18. The method of any one of the preceding embodiments, further        comprising: identifying a plurality of blockchain nodes for the        blockchain network; and determining a plurality of blockchain        node identifiers, wherein the plurality of blockchain node        identifiers comprises a respective blockchain node identifier        for each of the plurality of blockchain nodes.    -   19. The method of any one of the preceding embodiments, wherein        determining the plurality of blockchain node identifiers        comprises: designating the respective blockchain node identifier        for each of the plurality of blockchain nodes; and configuring        each of the plurality of blockchain nodes to output the        respective blockchain node identifier in response to a        blockchain operation.    -   20. The method of any of the preceding embodiments, further        comprising: designating a fourth blockchain node as having the        first node type; detecting a failure in maintaining the session        with the first blockchain node; and in response to detecting the        failure in maintaining the session with the first blockchain        node, re-establishing the session with the fourth blockchain        node.    -   21. The method of any one of the preceding embodiments, wherein        determining the order of the first block and the second block on        the canonical chain of the blockchain network further comprises:        generating a batch application programming interface call to        query a range of blocks of the canonical chain, wherein the        range of blocks comprises the first block and the second block;        and transmitting the batch application programming interface        call to the first blockchain node.    -   22. The method of any one of the preceding embodiments, further        comprising receiving, at a blockchain-interface layer, the first        on-chain data, wherein the first on-chain data comprises        hexadecimal encoded data from the first block of the blockchain        network, wherein the blockchain-interface layer transforms,        using a first compute engine, the first on-chain data to a first        format, and wherein the first format comprises data types with        field names identified by a respective integer.    -   23. A method, the method comprising: receiving on-chain data for        a plurality of blocks, wherein the plurality of blocks comprises        a first block comprising a first event of a plurality of        blockchain events within the on-chain data; determining a first        sequence number for the first event; determining a first chain        height for the first block; detecting a blockchain network        reorganization; in response to the blockchain network        reorganization: determining whether the first sequence number        corresponds to a highest sequence number among respective        sequence numbers for the plurality of blocks that have the first        chain height; determining that the first block corresponds to a        canonical chain for a blockchain network based on determining        that the first sequence number corresponds to the highest        sequence number among respective sequence numbers for the        plurality of blocks that have the first chain height; and        updating a blockchain index to indicate that the first block        corresponds to the canonical chain.    -   24. The method of any one of the preceding embodiments, wherein        the method is for creating a reorganization-immune blockchain        index using mono-increasing sequence records.    -   25. The method of any one of the preceding embodiments, wherein        the plurality of blocks further comprises a second block        comprising a second event of the plurality of blockchain events        within the on-chain data, and wherein the method further        comprises: determining a second sequence number for the second        event; determining a second chain height for the second block;        and determining whether the second sequence number corresponds        to a highest sequence number among respective sequence numbers        for the plurality of blocks that have the second chain height.    -   26. The method of any one of the preceding embodiment, further        comprising: determining that the second block corresponds to the        canonical chain for the blockchain network based on determining        that the second sequence number corresponds to the highest        sequence number among respective sequence numbers for the        plurality of blocks that have the second chain height; and        updating the blockchain index to indicate that the second block        corresponds to the canonical chain.    -   27. The method of any one of the preceding embodiments, further        comprising: determining that the second block does not        correspond to the canonical chain for the blockchain network        based on determining that the second sequence number does not        correspond to the highest sequence number among respective        sequence numbers for the plurality of blocks that have the        second chain height; and in response to determining that the        second block does not correspond to the canonical chain for the        blockchain network, updating the blockchain index to indicate        that the second block corresponds to an orphan chain.    -   28. The method of any one of the preceding embodiments, wherein        the first sequence number is assigned to the first event based        on an order in which the first event was received by an indexing        application.    -   29. The method of any one of the preceding embodiments, wherein        the first sequence number is assigned to the first event based        on an order of the first block and a second block on the        canonical chain of the blockchain network.    -   30. The method of any one of the preceding embodiments, wherein        detecting the blockchain network reorganization comprises:        receiving a first notification from a first blockchain node        identifying a last minted block for the blockchain network; and        determining that a previously minted block corresponds to an        orphan chain of the blockchain network.    -   31. The method of any one of the preceding embodiments, wherein        detecting the blockchain network reorganization comprises:        receiving a second notification indicating an enforcement of a        new rule by a subset of miners on the blockchain network; and        determining that the subset is a majority of miners on the        blockchain network.    -   32. The method of any one of the preceding embodiments, wherein        detecting the blockchain network reorganization comprises:        designating a first blockchain node of a plurality of blockchain        nodes for the blockchain network as having a first node type;        and receiving a third notification, from the first blockchain        node, identifying a new canonical chain.    -   33. The method of any one of the preceding embodiments, wherein        determining whether the first sequence number corresponds to the        highest sequence number among respective sequence numbers for        the plurality of blocks that have the first chain height further        comprises: retrieving respective sequence numbers for the        plurality of blocks that have the first chain height; ranking        the respective sequence numbers based on value; and determining        that the first sequence number corresponds to the highest        sequence number based on the ranking.    -   34. A method, the method comprising: receiving, at a data        lakehouse layer, first on-chain data in a first format via a        first input stream, wherein the first on-chain data originates        from a blockchain node of a blockchain network; transforming the        first on-chain data to a second format for storage in a second        dataset, wherein the second format comprises an unbounded table,        and wherein transforming the first on-chain data to the second        format comprises: detecting first new on-chain data in the first        input stream; appending the first new on-chain data to the        unbounded table as a micro batch; and storing the first new        on-chain data in the second dataset; and generating an output        based on the second dataset.    -   35. The method of any one of the preceding embodiments, wherein        the method is for supporting both batch processing and streaming        data applications, to load and process data incrementally, while        providing a near-constantly materialized dataset based on raw        blockchain data.    -   36. The method of any one of the preceding embodiments, further        comprising: receiving, via an application service layer, a        static table query; and processing the static table query using        the second dataset.    -   37. The method of any one of the preceding embodiments, wherein        transforming the first on-chain data to the second format        further comprises: detecting second new on-chain data in the        first input stream; modifying existing on-chain data in the        unbounded table based on the second new on-chain data; and        storing the second new on-chain data in the second dataset.    -   38. The method of any one of the preceding embodiments, wherein        transforming the first on-chain data to the second format        further comprises: detecting third new on-chain data in the        first input stream; modifying existing on-chain data in the        unbounded table based on the third new on-chain data using a        single call to insert or update the existing on-chain data in        the unbounded table; and storing the third new on-chain data in        the second dataset.    -   39. The method of any one of the preceding embodiments, wherein        transforming the first on-chain data to the second format        further comprises: detecting a blockchain network        reorganization; and modifying existing on-chain data in the        unbounded table based on the blockchain network reorganization.    -   40. The method of any one of the preceding embodiments, wherein        transforming the first on-chain data to the second format        further comprises: accessing an application binary interface for        a smart contract corresponding to the first on-chain data;        determining an on-chain event in the first on-chain data based        on the application binary interface; and storing the on-chain        event in the second dataset.    -   41. The method of any one of the preceding embodiments, further        comprising: receiving, at a blockchain-interface layer, the        first on-chain data from the blockchain node of the blockchain        network, wherein the first on-chain data comprises hexadecimal        encoded data from a first block of the blockchain network; and        transforming the first on-chain data to the first format, using        a first compute engine, for storage in a first dataset, wherein        the first format comprises data types with field names        identified by a respective integer, wherein the first compute        engine comprises a first workflow architecture, and wherein the        first workflow architecture comprises a first threshold for        workflow throughout and a first threshold for a number of        workflows.    -   42. The method of any one of the preceding embodiments, wherein        the second dataset comprises the first on-chain data and second        on-chain data, and wherein the second on-chain data is from a        second block on the blockchain network.    -   43. The method of any one of the preceding embodiments, wherein        transforming the first on-chain data to the second format for        storage in the second dataset comprises using a second compute        engine, wherein the second compute engine comprises a second        workflow architecture, wherein the second workflow architecture        comprises a second threshold for workflow throughout and a        second threshold for a number of workflows, wherein the second        threshold for workflow throughput is higher than the first        threshold for workflow throughput, and wherein the second        threshold for the number of workflows is lower than the first        threshold for the number of workflows.    -   44. The method of any one of the preceding embodiments, wherein        generating the output based on the second dataset comprises        generating an append-only delta table comprising added and        removed events.    -   45. The method of any one of the preceding embodiments, wherein        the second format comprises a columnar oriented format, and        wherein appending the first new on-chain data to the unbounded        table as the micro batch comprises adding a new column to the        unbounded table.    -   46. A tangible, non-transitory, machine-readable medium storing        instructions that, when executed by a data processing apparatus,        cause the data processing apparatus to perform operations        comprising those of any of embodiments 1-45.    -   47. A system comprising one or more processors; and memory        storing instructions that, when executed by the processors,        cause the processors to effectuate operations comprising those        of any of embodiments 1-45.    -   48. A system comprising means for performing any of embodiments        1-45.

What is claimed is:
 1. A system for creating a reorganization-immuneblockchain index using mono-increasing sequence records, the systemcomprising: one or more processors; and a non-transitorycomputer-readable medium having instructions recorded thereon that whenexecuted by the one or more processors causes operations comprising:receiving, at a blockchain-interface layer, on-chain data for aplurality of blocks, wherein the plurality of blocks comprises a firstblock comprising a first event of a plurality of blockchain eventswithin the on-chain data; transforming the on-chain data to a firstformat, wherein the first format comprises data types with field namesidentified by a respective integer; storing the on-chain data, in afirst dataset at the blockchain-interface layer, wherein the firstdataset comprises hexadecimal encoded data, and wherein the firstdataset uses the first format; determining a first sequence number forthe first event in the on-chain data; determining a first chain heightfor the first block; detecting a blockchain network reorganization; inresponse to the blockchain network reorganization: determining whetherthe first sequence number corresponds to a highest sequence number amongrespective sequence numbers for the plurality of blocks that have thefirst chain height; and determining that the first block corresponds toa canonical chain for a blockchain network based on determining that thefirst sequence number corresponds to the highest sequence number amongrespective sequence numbers for the plurality of blocks that have thefirst chain height; and updating a blockchain index to indicate that thefirst block corresponds to the canonical chain.
 2. A method for creatinga reorganization-immune blockchain index using mono-increasing sequencerecords, the method comprising: receiving on-chain data for a pluralityof blocks, wherein the plurality of blocks comprises a first blockcomprising a first event of a plurality of blockchain events within theon-chain data; determining a first sequence number for the first event;determining a first chain height for the first block; determiningwhether the first sequence number corresponds to a highest sequencenumber among respective sequence numbers for the plurality of blocksthat have the first chain height; determining that the first blockcorresponds to a canonical chain for a blockchain network based ondetermining that the first sequence number corresponds to the highestsequence number among respective sequence numbers for the plurality ofblocks that have the first chain height; and updating a blockchain indexto indicate that the first block corresponds to the canonical chain. 3.The method of claim 2, wherein the plurality of blocks further comprisesa second block comprising a second event of the plurality of blockchainevents within the on-chain data, and wherein the method furthercomprises: determining a second sequence number for the second event;determining a second chain height for the second block; and determiningwhether the second sequence number corresponds to a highest sequencenumber among respective sequence numbers for the plurality of blocksthat have the second chain height.
 4. The method of claim 3, furthercomprising: determining that the second block corresponds to thecanonical chain for the blockchain network based on determining that thesecond sequence number corresponds to the highest sequence number amongrespective sequence numbers for the plurality of blocks that have thesecond chain height; and updating the blockchain index to indicate thatthe second block corresponds to the canonical chain.
 5. The method ofclaim 3, further comprising: determining that the second block does notcorrespond to the canonical chain for the blockchain network based ondetermining that the second sequence number does not correspond to thehighest sequence number among respective sequence numbers for theplurality of blocks that have the second chain height; and in responseto determining that the second block does not correspond to thecanonical chain for the blockchain network, updating the blockchainindex to indicate that the second block corresponds to an orphan chain.6. The method of claim 2, wherein the first sequence number is assignedto the first event based on an order in which the first event wasreceived by an indexing application.
 7. The method of claim 2, whereinthe first sequence number is assigned to the first event based an orderof the first block and a second block on the canonical chain of theblockchain network.
 8. The method of claim 2, further comprisingdetecting a blockchain network reorganization, wherein detecting theblockchain network reorganization comprises: receiving a firstnotification from a first blockchain node identifying a last mintedblock for the blockchain network; and determining that a previouslyminted block corresponds to an orphan chain of the blockchain network.9. The method of claim 2, further comprising detecting a blockchainnetwork reorganization, wherein detecting the blockchain networkreorganization comprises: receiving a second notification indicating anenforcement of a new rule by a subset of miners on the blockchainnetwork; and determining that the subset is a majority of miners on theblockchain network.
 10. The method of claim 2, further comprisingdetecting a blockchain network reorganization, wherein detecting theblockchain network reorganization comprises: designating a firstblockchain node of a plurality of blockchain nodes for the blockchainnetwork as having a first node type; and receiving a third notification,from the first blockchain node, identifying a new canonical chain. 11.The method of claim 2, wherein determining whether the first sequencenumber corresponds to the highest sequence number among respectivesequence numbers for the plurality of blocks that have the first chainheight further comprises: retrieving respective sequence numbers for theplurality of blocks that have the first chain height; ranking therespective sequence numbers based on value; and determining that thefirst sequence number corresponds to the highest sequence number basedon the ranking.
 12. A non-transitory computer-readable medium havinginstructions recorded thereon that when executed by one or moreprocessors causes operations comprising: receiving on-chain data for aplurality of blocks, wherein the plurality of blocks comprises a firstblock comprising a first event of a plurality of blockchain eventswithin the on-chain data; determining a first sequence number for thefirst event; determining a first chain height for the first block;determining whether the first sequence number corresponds to a highestsequence number among respective sequence numbers for the plurality ofblocks that have the first chain height; determining that the firstblock corresponds to a canonical chain for a blockchain network based ondetermining that the first sequence number corresponds to the highestsequence number among respective sequence numbers for the plurality ofblocks that have the first chain height; and updating a blockchain indexto indicate that the first block corresponds to the canonical chain. 13.The non-transitory, computer-readable medium of claim 12, wherein theplurality of blocks further comprises a second block comprising a secondevent of the plurality of blockchain events within the on-chain data,and wherein the method further comprises: determining a second sequencenumber for the second event; determining a second chain height for thesecond block; and determining whether the second sequence numbercorresponds to a highest sequence number among respective sequencenumbers for the plurality of blocks that have the second chain height.14. The non-transitory, computer-readable medium of claim 13, whereinthe instructions further cause operation comprising: determining thatthe second block corresponds to the canonical chain for the blockchainnetwork based on determining that the second sequence number correspondsto the highest sequence number among respective sequence numbers for theplurality of blocks that have the second chain height; and updating theblockchain index to indicate that the second block corresponds to thecanonical chain.
 15. The non-transitory, computer-readable medium ofclaim 13, wherein the instructions further cause operation comprising:determining that the second block does not correspond to the canonicalchain for the blockchain network based on determining that the secondsequence number does not correspond to the highest sequence number amongrespective sequence numbers for the plurality of blocks that have thesecond chain height; and in response to determining that the secondblock does not correspond to the canonical chain for the blockchainnetwork, updating the blockchain index to indicate that the second blockcorresponds to an orphan chain.
 16. The non-transitory,computer-readable medium of claim 12, wherein the first sequence numberis assigned to the first event based on an order in which the firstevent was received by an indexing application.
 17. The non-transitory,computer-readable medium of claim 12, wherein the first sequence numberis assigned to the first event based an order of the first block and asecond block on the canonical chain of the blockchain network.
 18. Thenon-transitory, computer-readable medium of claim 12, wherein theinstructions further cause operations comprising detecting a blockchainnetwork reorganization, wherein detecting the blockchain networkreorganization comprises: receiving a first notification from a firstblockchain node identifying a last minted block for the blockchainnetwork; and determining that a previously minted block corresponds toan orphan chain of the blockchain network.
 19. The non-transitory,computer-readable medium of claim 12, wherein the instructions furthercause operations comprising detecting a blockchain networkreorganization, wherein detecting the blockchain network reorganizationcomprises: receiving a second notification indicating an enforcement ofa new rule by a subset of miners on the blockchain network; anddetermining that the subset is a majority of miners on the blockchainnetwork.
 20. The non-transitory, computer-readable medium of claim 12,wherein the instructions further cause operations comprising detecting ablockchain network reorganization, wherein detecting the blockchainnetwork reorganization comprises: designating a first blockchain node ofa plurality of blockchain nodes for the blockchain network as having afirst node type; and receiving a third notification, from the firstblockchain node, identifying a new canonical chain.